Publications

Note*:This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder. Please also observe the IEEE Copyright, ACM Copyright and Springer Copyright Notices.

Preprints and Selected Papers
  • Neural Topic Model via Optimal Transport
    He Zhao, Dinh Phung, Viet Huynh, Trung Le and Wray Buntine. In In Proc. of the 9th Int. Conf. on Learning Representations (ICLR), 2021. [ | ]
    @INPROCEEDINGS { zhao_etal_iclr2021_neural,
        AUTHOR = { He Zhao and Dinh Phung and Viet Huynh and Trung Le and Wray Buntine },
        BOOKTITLE = { In Proc. of the 9th Int. Conf. on Learning Representations (ICLR) },
        TITLE = { Neural Topic Model via Optimal Transport },
        YEAR = { 2021 },
        ARCHIVEPREFIX = { arXiv },
        EPRINT = { 2008.13537 },
        PRIMARYCLASS = { cs.IR },
        TIMESTAMP = { 2021-01-13 },
    }
C
  • Parameterized Rate-Distortion Stochastic Encoder
    Quan Hoang, Trung Le and Dinh Phung. In Proc. of the 37th International Conference on Machine Learning (ICML), 2020. [ | ]
    @INPROCEEDINGS { hoang_etal_icml20_parameterized,
        AUTHOR = { Quan Hoang and Trung Le and Dinh Phung },
        BOOKTITLE = { Proc. of the 37th International Conference on Machine Learning (ICML) },
        TITLE = { Parameterized Rate-Distortion Stochastic Encoder },
        YEAR = { 2020 },
    }
C
  • A Relational Memory-based Embedding Model for Triple Classification and Search Personalization
    Dai Quoc Nguyen, Tu Dinh Nguyen and Dinh Phung. In Proc. of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), 2020. [ | | pdf]
    Knowledge graph embedding methods often suffer from a limitation of memorizing valid triples to predict new ones for triple classification and search personalization problems. To this end, we introduce a novel embedding model, named R-MeN, that explores a relational memory network to encode potential dependencies in relationship triples. R-MeN considers each triple as a sequence of 3 input vectors that recurrently interact with a memory using a transformer self-attention mechanism. Thus R-MeN encodes new information from interactions between the memory and each input vector to return a corresponding vector. Consequently, R-MeN feeds these 3 returned vectors to a convolutional neural network-based decoder to produce a scalar score for the triple. Experimental results show that our proposed R-MeN obtains state-of-the-art results on SEARCH17 for the search personalization task, and on WN11 and FB13 for the triple classification task.
    @INPROCEEDINGS { nguyen_etal_acl9_relational,
        AUTHOR = { Dai Quoc Nguyen and Tu Dinh Nguyen and Dinh Phung },
        BOOKTITLE = { Proc. of the 58th Annual Meeting of the Association for Computational Linguistics (ACL) },
        TITLE = { A Relational Memory-based Embedding Model for Triple Classification and Search Personalization },
        YEAR = { 2020 },
        ABSTRACT = { Knowledge graph embedding methods often suffer from a limitation of memorizing valid triples to predict new ones for triple classification and search personalization problems. To this end, we introduce a novel embedding model, named R-MeN, that explores a relational memory network to encode potential dependencies in relationship triples. R-MeN considers each triple as a sequence of 3 input vectors that recurrently interact with a memory using a transformer self-attention mechanism. Thus R-MeN encodes new information from interactions between the memory and each input vector to return a corresponding vector. Consequently, R-MeN feeds these 3 returned vectors to a convolutional neural network-based decoder to produce a scalar score for the triple. Experimental results show that our proposed R-MeN obtains state-of-the-art results on SEARCH17 for the search personalization task, and on WN11 and FB13 for the triple classification task. },
        FILE = { :nguyen_etal_acl9_relational - A Relational Memory Based Embedding Model for Triple Classification and Search Personalization.PDF:PDF },
        URL = { https://arxiv.org/abs/1907.06080 },
    }
C
  • Deep Generative Models of Sparse and Overdispersed Discrete Data
    He Zhao, Piyush Rai, Lan Du, Wray Buntine, Dinh Phung and Mingyuan Zhou. In Proc of the 23rd Int. Conf. on Artificial Intelligence and Statistics (AISTATS), 2020. [ | | pdf]
    In this paper, we propose a variational autoencoder based framework that generates discrete data, including both count-valued and binary data, via negativebinomial distribution. We also examine the model’s ability to capture self- and cross-excitations in discrete data, which are critical for modelling overdispersion. We conduct extensive experiments on text analysis and collaborative filtering. Compared with several state-of-the-art baselines, the proposed models achieve significantly better performance on the above problems. By achieving superior modelling performance with a simple yet effect Bayesian extension to VAEs, we demonstrate that it is feasible to adapt the knowledge and experience of Bayesian probabilistic matrix factorisation into newly-developed deep generative models.
    @INPROCEEDINGS { zhao_etal_aistats20_deepgenerative,
        AUTHOR = { He Zhao and Piyush Rai and Lan Du and Wray Buntine and Dinh Phung and Mingyuan Zhou },
        TITLE = { Deep Generative Models of Sparse and Overdispersed Discrete Data },
        BOOKTITLE = { Proc of the 23rd Int. Conf. on Artificial Intelligence and Statistics (AISTATS) },
        YEAR = { 2020 },
        ABSTRACT = { In this paper, we propose a variational autoencoder based framework that generates discrete data, including both count-valued and binary data, via negativebinomial distribution. We also examine the model’s ability to capture self- and cross-excitations in discrete data, which are critical for modelling overdispersion. We conduct extensive experiments on text analysis and collaborative filtering. Compared with several state-of-the-art baselines, the proposed models achieve significantly better performance on the above problems. By achieving superior modelling performance with a simple yet effect Bayesian extension to VAEs, we demonstrate that it is feasible to adapt the knowledge and experience of Bayesian probabilistic matrix factorisation into newly-developed deep generative models. },
        FILE = { :zhao_etal_aistats20_deepgenerative - Deep Generative Models of Sparse and Overdispersed Discrete Data.pdf:PDF },
        URL = { https://www.semanticscholar.org/paper/Deep-Generative-Models-of-Sparse-and-Overdispersed-Zhao-Rai/8136c46488875b09e15e89c08bf02698901322a1 },
    }
C
  • Learning Generative Adversarial Networks from Multiple Data Sources
    Trung Le, Quan Hoang, Hung Vu, Tu Dinh Nguyen, Hung Bui and Dinh Phung. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI), pages 2823-2829, July 2019. [ | | pdf]
    Generative Adversarial Networks (GANs) are a powerful class of deep generative models. In this paper, we extend GAN to the problem of generating data that are not only close to a primary data source but also required to be different from auxiliary data sources. For this problem, we enrich both GANs’ formulations and applications by introducing pushing forces that thrust generated samples away from given auxiliary data sources. We term our method Push-and-Pull GAN (P2GAN). We conduct extensive experiments to demonstratethe merit of P2GAN in two applications: generating data with constraints and addressing the mode collapsing problem. We use CIFAR-10, STL-10, and ImageNet datasets and compute Fréchet Inception Distance to evaluate P2GAN’s effectiveness in addressing the mode collapsing problem. The results show that P2GAN outperforms the state-of-the-art baselines. For the problem of generating data with constraints, we show that P2GAN can successfully avoid generating specific features such as black hair.
    @INPROCEEDINGS { le_etal_ijcai19_learningGAN,
        AUTHOR = { Trung Le and Quan Hoang and Hung Vu and Tu Dinh Nguyen and Hung Bui and Dinh Phung },
        TITLE = { Learning Generative Adversarial Networks from Multiple Data Sources },
        BOOKTITLE = { Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI) },
        YEAR = { 2019 },
        PAGES = { 2823--2829 },
        MONTH = { July },
        PUBLISHER = { International Joint Conferences on Artificial Intelligence Organization },
        ABSTRACT = { Generative Adversarial Networks (GANs) are a powerful class of deep generative models. In this paper, we extend GAN to the problem of generating data that are not only close to a primary data source but also required to be different from auxiliary data sources. For this problem, we enrich both GANs’ formulations and applications by introducing pushing forces that thrust generated samples away from given auxiliary data sources. We term our method Push-and-Pull GAN (P2GAN). We conduct extensive experiments to demonstratethe merit of P2GAN in two applications: generating data with constraints and addressing the mode collapsing problem. We use CIFAR-10, STL-10, and ImageNet datasets and compute Fréchet Inception Distance to evaluate P2GAN’s effectiveness in addressing the mode collapsing problem. The results show that P2GAN outperforms the state-of-the-art baselines. For the problem of generating data with constraints, we show that P2GAN can successfully avoid generating specific features such as black hair. },
        FILE = { :le_etal_ijcai19_learningGAN - Learning Generative Adversarial Networks from Multiple Data Sources.pdf:PDF },
        URL = { https://www.ijcai.org/Proceedings/2019/391 },
    }
C
  • Three-Player Wasserstein GAN via Amortised Duality
    Nhan Dam, Quan Hoang, Trung Le, Tu Dinh Nguyen, Hung Bui and Dinh Phung. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, (IJCAI), pages 2202-2208, July 2019. [ | | pdf]
    We propose a new formulation for learning generative adversarial networks (GANs) using optimal transport cost (the general form of Wasserstein distance) as the objective criterion to measure the dissimilarity between target distribution and learned distribution. Our formulation is based on the general form of the Kantorovich duality which is applicable to optimal transport with a wide range of cost functions that are not necessarily a metric. To make optimising this duality form amenable to gradient-based methods, we employ a function that acts as an amortised optimiser for the innermost optimisation problem. Interestingly, the amortised optimiser can be viewed as a mover since it strategically shifts around data points. The resulting formulation is a sequential min-max-min game with 3 players: the generator, the critic, and the mover where the new player, the mover, attempts to fool the critic by shifting the data around. Despite involving three players, we demonstrate that our proposed formulation can be solved reasonably effectively via a simple alternative gradient learning strategy. Compared with the existing Lipschitz-constrained formulations of Wasserstein GAN on CIFAR-10, our model yields significantly better diversity scores than weight clipping and comparable performance to gradient penalty method.
    @INPROCEEDINGS { dam_etal_ijcai19_3pwgan,
        AUTHOR = { Nhan Dam and Quan Hoang and Trung Le and Tu Dinh Nguyen and Hung Bui and Dinh Phung },
        BOOKTITLE = { Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, (IJCAI) },
        TITLE = { Three-Player {W}asserstein {GAN} via Amortised Duality },
        YEAR = { 2019 },
        MONTH = { July },
        PAGES = { 2202--2208 },
        PUBLISHER = { International Joint Conferences on Artificial Intelligence Organization },
        ABSTRACT = { We propose a new formulation for learning generative adversarial networks (GANs) using optimal transport cost (the general form of Wasserstein distance) as the objective criterion to measure the dissimilarity between target distribution and learned distribution. Our formulation is based on the general form of the Kantorovich duality which is applicable to optimal transport with a wide range of cost functions that are not necessarily a metric. To make optimising this duality form amenable to gradient-based methods, we employ a function that acts as an amortised optimiser for the innermost optimisation problem. Interestingly, the amortised optimiser can be viewed as a mover since it strategically shifts around data points. The resulting formulation is a sequential min-max-min game with 3 players: the generator, the critic, and the mover where the new player, the mover, attempts to fool the critic by shifting the data around. Despite involving three players, we demonstrate that our proposed formulation can be solved reasonably effectively via a simple alternative gradient learning strategy. Compared with the existing Lipschitz-constrained formulations of Wasserstein GAN on CIFAR-10, our model yields significantly better diversity scores than weight clipping and comparable performance to gradient penalty method. },
        FILE = { :dam_etal_ijcai19_3pwgan - Three Player Wasserstein GAN Via Amortised Duality.pdf:PDF },
        URL = { https://www.ijcai.org/Proceedings/2019/305 },
    }
C
  • Learning How to Active Learn by Dreaming
    Thuy-Trang Vu, Ming Liu, Dinh Phung and Gholamreza Haffari. In In Proc. of Annual Meeting of the Association for Computational Linguistics (ACL), Florence, Italy, jul 2019. [ | ]
    @INPROCEEDINGS { vu_etal_acl19_learning,
        AUTHOR = { Thuy-Trang Vu and Ming Liu and Dinh Phung and Gholamreza Haffari },
        TITLE = { Learning How to Active Learn by Dreaming },
        BOOKTITLE = { In Proc. of Annual Meeting of the Association for Computational Linguistics (ACL) },
        YEAR = { 2019 },
        ADDRESS = { Florence, Italy },
        MONTH = { jul },
    }
C
  • A Capsule Network-based Embedding Model for Knowledge Graph Completion and Search Personalization
    Dai Quoc Nguyen, Thanh Vu, Tu Dinh Nguyen, Dat Quoc Nguyen and Dinh Phung. In In Proc. of Annual Conf. of the North American Chapter of the Association for Computational Linguistics (NAACL), Minneapolis, USA, jun 2019. [ | | pdf]
    In this paper, we introduce an embedding model, named CapsE, exploring a capsule network to model relationship triples (subject, relation, object). Our CapsE represents each triple as a 3-column matrix where each column vector represents the embedding of an element in the triple. This 3-column matrix is then fed to a convolution layer where multiple filters are operated to generate different feature maps. These feature maps are used to construct capsules in the first capsule layer. Capsule layers are connected via dynamic routing mechanism. The last capsule layer consists of only one capsule to produce a vector output. The length of this vector output is used to measure the plausibility of the triple. Our proposed CapsE obtains state-of-the-art link prediction results for knowledge graph completion on two benchmark datasets: WN18RR and FB15k-237, and outperforms strong search personalization baselines on SEARCH17 dataset.
    @INPROCEEDINGS { nguyen_etal_naaclhtl19_acapsule,
        AUTHOR = { Dai Quoc Nguyen and Thanh Vu and Tu Dinh Nguyen and Dat Quoc Nguyen and Dinh Phung },
        TITLE = { A Capsule Network-based Embedding Model for Knowledge Graph Completion and Search Personalization },
        BOOKTITLE = { In Proc. of Annual Conf. of the North American Chapter of the Association for Computational Linguistics (NAACL) },
        YEAR = { 2019 },
        ADDRESS = { Minneapolis, USA },
        MONTH = { jun },
        ABSTRACT = { In this paper, we introduce an embedding model, named CapsE, exploring a capsule network to model relationship triples (subject, relation, object). Our CapsE represents each triple as a 3-column matrix where each column vector represents the embedding of an element in the triple. This 3-column matrix is then fed to a convolution layer where multiple filters are operated to generate different feature maps. These feature maps are used to construct capsules in the first capsule layer. Capsule layers are connected via dynamic routing mechanism. The last capsule layer consists of only one capsule to produce a vector output. The length of this vector output is used to measure the plausibility of the triple. Our proposed CapsE obtains state-of-the-art link prediction results for knowledge graph completion on two benchmark datasets: WN18RR and FB15k-237, and outperforms strong search personalization baselines on SEARCH17 dataset. },
        FILE = { :nguyen_etal_naaclhtl19_acapsule - A Capsule Network Based Embedding Model for Knowledge Graph Completion and Search Personalization.pdf:PDF },
        URL = { https://arxiv.org/abs/1808.04122 },
    }
C
  • Probabilistic Multilevel Clustering via Composite Transportation Distance
    Viet Huynh, Nhat Ho, Dinh Phung and Michael I. Jordan. In In Proc. of Int. Conf. on Artificial Intelligence and Statistics (AISTATS), Okinawa, Japan, apr 2019. [ | | pdf]
    We propose a novel probabilistic approach to multilevel clustering problems based on composite transportation distance, which is a variant of transportation distance where the underlying metric is Kullback-Leibler divergence. Our method involves solving a joint optimization problem over spaces of probability measures to simultaneously discover grouping structures within groups and among groups. By exploiting the connection of our method to the problem of finding composite transportation barycenters, we develop fast and efficient optimization algorithms even for potentially large-scale multilevel datasets. Finally, we present experimental results with both synthetic and real data to demonstrate the efficiency and scalability of the proposed approach.
    @INPROCEEDINGS { ho_etal_aistats19_probabilistic,
        AUTHOR = { Viet Huynh and Nhat Ho and Dinh Phung and Michael I. Jordan },
        TITLE = { Probabilistic Multilevel Clustering via Composite Transportation Distance },
        BOOKTITLE = { In Proc. of Int. Conf. on Artificial Intelligence and Statistics (AISTATS) },
        YEAR = { 2019 },
        ADDRESS = { Okinawa, Japan },
        MONTH = { apr },
        ABSTRACT = { We propose a novel probabilistic approach to multilevel clustering problems based on composite transportation distance, which is a variant of transportation distance where the underlying metric is Kullback-Leibler divergence. Our method involves solving a joint optimization problem over spaces of probability measures to simultaneously discover grouping structures within groups and among groups. By exploiting the connection of our method to the problem of finding composite transportation barycenters, we develop fast and efficient optimization algorithms even for potentially large-scale multilevel datasets. Finally, we present experimental results with both synthetic and real data to demonstrate the efficiency and scalability of the proposed approach. },
        FILE = { :ho_etal_aistats19_probabilistic - Probabilistic Multilevel Clustering Via Composite Transportation Distance.pdf:PDF },
        JOURNAL = { In Proc. of Int. Conf. on Artificial Intelligence and Statistics (AISTATS) },
        URL = { https://arxiv.org/abs/1810.11911 },
    }
C
  • Maximal Divergence Sequential Autoencoder for Binary Software Vulnerability Detection
    Tue Le, Tuan Nguyen, Trung Le, Dinh Phung, Paul Montague, Olivier De Vel and Lizhen Qu. In International Conference on Learning Representations (ICLR), 2019. [ | | pdf]
    @INPROCEEDINGS { le_etal_iclr18_maximal,
        AUTHOR = { Tue Le and Tuan Nguyen and Trung Le and Dinh Phung and Paul Montague and Olivier De Vel and Lizhen Qu },
        TITLE = { Maximal Divergence Sequential Autoencoder for Binary Software Vulnerability Detection },
        BOOKTITLE = { International Conference on Learning Representations (ICLR) },
        YEAR = { 2019 },
        FILE = { :le_etal_iclr18_maximal - Maximal Divergence Sequential Autoencoder for Binary Software Vulnerability Detection.pdf:PDF },
        URL = { https://openreview.net/forum?id=ByloIiCqYQ },
    }
C
  • Robust Anomaly Detection in Videos using Multilevel Representations
    Hung Vu, Tu Dinh Nguyen, Trung Le, Wei Luo and Dinh Phung. In In Proceedings of Thirty-third AAAI Conference on Artificial Intelligence (AAAI), Honolulu, USA, 2019. [ | | pdf]
    @INPROCEEDINGS { vu_etal_aaai19_robustanomaly,
        AUTHOR = { Hung Vu and Tu Dinh Nguyen and Trung Le and Wei Luo and Dinh Phung },
        TITLE = { Robust Anomaly Detection in Videos using Multilevel Representations },
        BOOKTITLE = { In Proceedings of Thirty-third AAAI Conference on Artificial Intelligence (AAAI) },
        YEAR = { 2019 },
        ADDRESS = { Honolulu, USA },
        FILE = { :vu_etal_aaai19_robustanomaly - Robust Anomaly Detection in Videos Using Multilevel Representations.pdf:PDF },
        GROUPS = { Anomaly Detection },
        URL = { https://github.com/SeaOtter/vad_gan },
    }
C
  • Robust Bayesian Kernel Machine via Stein Variational Gradient Descent for Big Data
    Khanh Nguyen, Trung Le, Tu Nguyen, Geoff Webb and Dinh Phung. In Proc. of the 24th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD), London, UK, aug 2018. [ | ]
    Kernel methods are powerful supervised machine learning models for their strong generalization ability, especially on limited data to effectively generalize on unseen data. However, most kernel methods, including the state-of-the-art LIBSVM, are vulnerable to the curse of kernelization, making them infeasible to apply to large-scale datasets. This issue is exacerbated when kernel methods are used in conjunction with a grid search to tune their kernel parameters and hyperparameters which brings in the question of model robustness when applied to real datasets. In this paper, we propose a robust Bayesian Kernel Machine (BKM) – a Bayesian kernel machine that exploits the strengths of both the Bayesian modelling and kernel methods. A key challenge for such a formulation is the need for an efcient learning algorithm. To this end, we successfully extended the recent Stein variational theory for Bayesian inference for our proposed model, resulting in fast and efcient learning and prediction algorithms. Importantly our proposed BKM is resilient to the curse of kernelization, hence making it applicable to large-scale datasets and robust to parameter tuning, avoiding the associated expense and potential pitfalls with current practice of parameter tuning. Our extensive experimental results on 12 benchmark datasets show that our BKM without tuning any parameter can achieve comparable predictive performance with the state-of-the-art LIBSVM and signifcantly outperforms other baselines, while obtaining signifcantly speedup in terms of the total training time compared with its rivals.
    @INPROCEEDINGS { nguyen_etal_kdd18_robustbayesian,
        AUTHOR = { Khanh Nguyen and Trung Le and Tu Nguyen and Geoff Webb and Dinh Phung },
        TITLE = { Robust Bayesian Kernel Machine via Stein Variational Gradient Descent for Big Data },
        BOOKTITLE = { Proc. of the 24th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD) },
        YEAR = { 2018 },
        ADDRESS = { London, UK },
        MONTH = { aug },
        PUBLISHER = { ACM },
        ABSTRACT = { Kernel methods are powerful supervised machine learning models for their strong generalization ability, especially on limited data to effectively generalize on unseen data. However, most kernel methods, including the state-of-the-art LIBSVM, are vulnerable to the curse of kernelization, making them infeasible to apply to large-scale datasets. This issue is exacerbated when kernel methods are used in conjunction with a grid search to tune their kernel parameters and hyperparameters which brings in the question of model robustness when applied to real datasets. In this paper, we propose a robust Bayesian Kernel Machine (BKM) – a Bayesian kernel machine that exploits the strengths of both the Bayesian modelling and kernel methods. A key challenge for such a formulation is the need for an efcient learning algorithm. To this end, we successfully extended the recent Stein variational theory for Bayesian inference for our proposed model, resulting in fast and efcient learning and prediction algorithms. Importantly our proposed BKM is resilient to the curse of kernelization, hence making it applicable to large-scale datasets and robust to parameter tuning, avoiding the associated expense and potential pitfalls with current practice of parameter tuning. Our extensive experimental results on 12 benchmark datasets show that our BKM without tuning any parameter can achieve comparable predictive performance with the state-of-the-art LIBSVM and signifcantly outperforms other baselines, while obtaining signifcantly speedup in terms of the total training time compared with its rivals. },
        FILE = { :nguyen_etal_kdd18_robustbayesian - Robust Bayesian Kernel Machine Via Stein Variational Gradient Descent for Big Data.pdf:PDF },
    }
C
  • MGAN: Training Generative Adversarial Nets with Multiple Generators
    Quan Hoang, Tu Dinh Nguyen, Trung Le and Dinh Phung. In International Conference on Learning Representations (ICLR), 2018. [ | | pdf]
    We propose in this paper a new approach to train the Generative Adversarial Nets (GANs) with a mixture of generators to overcome the mode collapsing problem. The main intuition is to employ multiple generators, instead of using a single one as in the original GAN. The idea is simple, yet proven to be extremely effective at covering diverse data modes, easily overcoming the mode collapsing problem and delivering state-of-the-art results. A minimax formulation was able to establish among a classifier, a discriminator, and a set of generators in a similar spirit with GAN. Generators create samples that are intended to come from the same distribution as the training data, whilst the discriminator determines whether samples are true data or generated by generators, and the classifier specifies which generator a sample comes from. The distinguishing feature is that internal samples are created from multiple generators, and then one of them will be randomly selected as final output similar to the mechanism of a probabilistic mixture model. We term our method Mixture Generative Adversarial Nets (MGAN). We develop theoretical analysis to prove that, at the equilibrium, the Jensen-Shannon divergence (JSD) between the mixture of generators’ distributions and the empirical data distribution is minimal, whilst the JSD among generators’ distributions is maximal, hence effectively avoiding the mode collapsing problem. By utilizing parameter sharing, our proposed model adds minimal computational cost to the standard GAN, and thus can also efficiently scale to large-scale datasets. We conduct extensive experiments on synthetic 2D data and natural image databases (CIFAR-10, STL-10 and ImageNet) to demonstrate the superior performance of our MGAN in achieving state-of-the-art Inception scores over latest baselines, generating diverse and appealing recognizable objects at different resolutions, and specializing in capturing different types of objects by the generators.
    @INPROCEEDINGS { hoang_etal_iclr18_mgan,
        AUTHOR = { Quan Hoang and Tu Dinh Nguyen and Trung Le and Dinh Phung },
        TITLE = { {MGAN}: Training Generative Adversarial Nets with Multiple Generators },
        BOOKTITLE = { International Conference on Learning Representations (ICLR) },
        YEAR = { 2018 },
        ABSTRACT = { We propose in this paper a new approach to train the Generative Adversarial Nets (GANs) with a mixture of generators to overcome the mode collapsing problem. The main intuition is to employ multiple generators, instead of using a single one as in the original GAN. The idea is simple, yet proven to be extremely effective at covering diverse data modes, easily overcoming the mode collapsing problem and delivering state-of-the-art results. A minimax formulation was able to establish among a classifier, a discriminator, and a set of generators in a similar spirit with GAN. Generators create samples that are intended to come from the same distribution as the training data, whilst the discriminator determines whether samples are true data or generated by generators, and the classifier specifies which generator a sample comes from. The distinguishing feature is that internal samples are created from multiple generators, and then one of them will be randomly selected as final output similar to the mechanism of a probabilistic mixture model. We term our method Mixture Generative Adversarial Nets (MGAN). We develop theoretical analysis to prove that, at the equilibrium, the Jensen-Shannon divergence (JSD) between the mixture of generators’ distributions and the empirical data distribution is minimal, whilst the JSD among generators’ distributions is maximal, hence effectively avoiding the mode collapsing problem. By utilizing parameter sharing, our proposed model adds minimal computational cost to the standard GAN, and thus can also efficiently scale to large-scale datasets. We conduct extensive experiments on synthetic 2D data and natural image databases (CIFAR-10, STL-10 and ImageNet) to demonstrate the superior performance of our MGAN in achieving state-of-the-art Inception scores over latest baselines, generating diverse and appealing recognizable objects at different resolutions, and specializing in capturing different types of objects by the generators. },
        FILE = { :hoang_etal_iclr18_mgan - MGAN_ Training Generative Adversarial Nets with Multiple Generators.pdf:PDF },
        URL = { https://openreview.net/forum?id=rkmu5b0a- },
    }
C
  • Geometric enclosing networks
    Trung Le, Hung Vu, Tu Dinh Nguyen and Dinh Phung. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, {IJCAI-18}, pages 2355-2361, July 2018. [ | ]
    Training model to generate data has increasingly attracted research attention and become important in modern world applications. We propose in this paper a new geometry-based optimization approach to address this problem. Orthogonal to current stateof-the-art density-based approaches, most notably VAE and GAN, we present a fresh new idea that borrows the principle of minimal enclosing ball to train a generator G (z) in such a way that both training and generated data, after being mapped to the feature space, are enclosed in the same sphere. We develop theory to guarantee that the mapping is bijective so that its inverse from feature space to data space results in expressive nonlinear contours to describe the data manifold, hence ensuring data generated are also lying on the data manifold learned from training data. Our model enjoys a nice geometric interpretation, hence termed Geometric Enclosing Networks (GEN), and possesses some key advantages over its rivals, namely simple and easy-to-control optimization formulation, avoidance of mode collapsing and efficiently learn data manifold representation in a completely unsupervised manner. We conducted extensive experiments on synthesis and real-world datasets to illustrate the behaviors, strength and weakness of our proposed GEN, in particular its ability to handle multi-modal data and quality of generated data.
    @INPROCEEDINGS { le_etal_ijcai18_geometric,
        AUTHOR = { Trung Le and Hung Vu and Tu Dinh Nguyen and Dinh Phung },
        TITLE = { Geometric enclosing networks },
        BOOKTITLE = { Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, {IJCAI-18} },
        PUBLISHER = { International Joint Conferences on Artificial Intelligence Organization },
        PAGES = { 2355--2361 },
        YEAR = { 2018 },
        MONTH = { July },
        ABSTRACT = { Training model to generate data has increasingly attracted research attention and become important in modern world applications. We propose in this paper a new geometry-based optimization approach to address this problem. Orthogonal to current stateof-the-art density-based approaches, most notably VAE and GAN, we present a fresh new idea that borrows the principle of minimal enclosing ball to train a generator G (z) in such a way that both training and generated data, after being mapped to the feature space, are enclosed in the same sphere. We develop theory to guarantee that the mapping is bijective so that its inverse from feature space to data space results in expressive nonlinear contours to describe the data manifold, hence ensuring data generated are also lying on the data manifold learned from training data. Our model enjoys a nice geometric interpretation, hence termed Geometric Enclosing Networks (GEN), and possesses some key advantages over its rivals, namely simple and easy-to-control optimization formulation, avoidance of mode collapsing and efficiently learn data manifold representation in a completely unsupervised manner. We conducted extensive experiments on synthesis and real-world datasets to illustrate the behaviors, strength and weakness of our proposed GEN, in particular its ability to handle multi-modal data and quality of generated data. },
        FILE = { :le_etal_ijcai18_geometric - Geometric Enclosing Networks.pdf:PDF },
    }
C
  • A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network
    Dai Quoc Nguyen, Tu Dinh Nguyen, Dat Quoc Nguyen and Dinh Phung. In Proc. of. the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL), 2018. [ | | pdf]
    We introduce a novel embedding method for knowledge base completion task. Our approach advances state-of-the-art (SOTA) by employing a convolutional neural network (CNN) for the task which can capture global relationships and transitional characteristics. We represent each triple (head entity, relation, tail entity) as a 3-column matrix which is the input for the convolution layer. Different filters having a same shape of 1x3 are operated over the input matrix to produce different feature maps which are then concatenated into a single feature vector. This vector is used to return a score for the triple via a dot product. The returned score is used to predict whether the triple is valid or not. Experiments show that ConvKB achieves better link prediction results than previous SOTA models on two current benchmark datasets WN18RR and FB15k-237.
    @INPROCEEDINGS { nguyen_etal_naacl18_anovelembedding,
        AUTHOR = { Dai Quoc Nguyen and Tu Dinh Nguyen and Dat Quoc Nguyen and Dinh Phung },
        TITLE = { A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network },
        BOOKTITLE = { Proc. of. the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL) },
        YEAR = { 2018 },
        ABSTRACT = { We introduce a novel embedding method for knowledge base completion task. Our approach advances state-of-the-art (SOTA) by employing a convolutional neural network (CNN) for the task which can capture global relationships and transitional characteristics. We represent each triple (head entity, relation, tail entity) as a 3-column matrix which is the input for the convolution layer. Different filters having a same shape of 1x3 are operated over the input matrix to produce different feature maps which are then concatenated into a single feature vector. This vector is used to return a score for the triple via a dot product. The returned score is used to predict whether the triple is valid or not. Experiments show that ConvKB achieves better link prediction results than previous SOTA models on two current benchmark datasets WN18RR and FB15k-237. },
        FILE = { :nguyen_etal_naacl18_anovelembedding - A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network.pdf:PDF },
        URL = { https://arxiv.org/abs/1712.02121 },
    }
C
  • Learning Graph Representation via Frequent Subgraphs
    Dang Nguyen, Wei Luo, Tu Dinh Nguyen, Svetha Venkatesh and Dinh Phung. In Proc. of SIAM Int. Conf. on Data Mining (SDM), 2018. (Student travel award). [ | ]
    @INPROCEEDINGS { nguyen_etal_sdm18_learning,
        AUTHOR = { Dang Nguyen and Wei Luo and Tu Dinh Nguyen and Svetha Venkatesh and Dinh Phung },
        TITLE = { Learning Graph Representation via Frequent Subgraphs },
        BOOKTITLE = { Proc. of SIAM Int. Conf. on Data Mining (SDM) },
        YEAR = { 2018 },
        PUBLISHER = { SIAM },
        NOTE = { Student travel award },
        FILE = { :nguyen_etal_sdm18_learning - Learning Graph Representation Via Frequent Subgraphs.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2018.01.12 },
    }
C
  • Model-Based Learning for Point Pattern Data
    Ba-Ngu Vo, Nhan Dam, Dinh Phung, Quang N. Tran and Ba-Tuong Vo. Pattern Recognition (PR), 84:136-151, 2018. [ | | pdf]
    This article proposes a framework for model-based point pattern learning using point process theory. Likelihood functions for point pattern data derived from point process theory enable principled yet conceptually transparent extensions of learning tasks, such as classification, novelty detection and clustering, to point pattern data. Furthermore, tractable point pattern models as well as solutions for learning and decision making from point pattern data are developed.
    @ARTICLE { vo_etal_pr18_modelbased,
        AUTHOR = { Ba-Ngu Vo and Nhan Dam and Dinh Phung and Quang N. Tran and Ba-Tuong Vo },
        JOURNAL = { Pattern Recognition (PR) },
        TITLE = { Model-Based Learning for Point Pattern Data },
        YEAR = { 2018 },
        ISSN = { 0031-3203 },
        PAGES = { 136--151 },
        VOLUME = { 84 },
        ABSTRACT = { This article proposes a framework for model-based point pattern learning using point process theory. Likelihood functions for point pattern data derived from point process theory enable principled yet conceptually transparent extensions of learning tasks, such as classification, novelty detection and clustering, to point pattern data. Furthermore, tractable point pattern models as well as solutions for learning and decision making from point pattern data are developed. },
        DOI = { https://doi.org/10.1016/j.patcog.2018.07.008 },
        FILE = { :vo_etal_pr18_modelbased - Model Based Learning for Point Pattern Data.pdf:PDF },
        KEYWORDS = { Point pattern, Point process, Random finite set, Multiple instance learning, Classification, Novelty detection, Clustering },
        PUBLISHER = { Elsevier },
        URL = { http://www.sciencedirect.com/science/article/pii/S0031320318302395 },
    }
J
  • Dual Discriminator Generative Adversarial Nets
    Tu Dinh Nguyen, Trung Le, Hung Vu and Dinh Phung. In Proc. of the 31st Int. Conf. on Neural Information Processing Systems (NIPS), pages 2667-2677, USA, 2017. [ | | pdf]
    We propose in this paper a novel approach to tackle the problem of mode collapse encountered in generative adversarial network (GAN). Our idea is intuitive but proven to be very effective, especially in addressing some key limitations of GAN. In essence, it combines the Kullback-Leibler (KL) and reverse KL divergences into a unified objective function, thus it exploits the complementary statistical properties from these divergences to effectively diversify the estimated density in capturing multi-modes. We term our method dual discriminator generative adversarial nets (D2GAN) which, unlike GAN, has two discriminators; and together with a generator, it also has the analogy of a minimax game, wherein a discriminator rewards high scores for samples from data distribution whilst another discriminator, conversely, favoring data from the generator, and the generator produces data to fool both two discriminators. We develop theoretical analysis to show that, given the maximal discriminators, optimizing the generator of D2GAN reduces to minimizing both KL and reverse KL divergences between data distribution and the distribution induced from the data generated by the generator, hence effectively avoiding the mode collapsing problem. We conduct extensive experiments on synthetic and real-world large-scale datasets (MNIST, CIFAR-10, STL-10, ImageNet), where we have made our best effort to compare our D2GAN with the latest state-of-the-art GAN's variants in comprehensive qualitative and quantitative evaluations. The experimental results demonstrate the competitive and superior performance of our approach in generating good quality and diverse samples over baselines, and the capability of our method to scale up to ImageNet database.
    @INPROCEEDINGS { tu_etal_nips17_d2gan,
        AUTHOR = { Tu Dinh Nguyen and Trung Le and Hung Vu and Dinh Phung },
        TITLE = { Dual Discriminator Generative Adversarial Nets },
        BOOKTITLE = { Proc. of the 31st Int. Conf. on Neural Information Processing Systems (NIPS) },
        YEAR = { 2017 },
        SERIES = { NIPS'17 },
        PAGES = { 2667--2677 },
        ADDRESS = { USA },
        PUBLISHER = { Curran Associates Inc. },
        ABSTRACT = { We propose in this paper a novel approach to tackle the problem of mode collapse encountered in generative adversarial network (GAN). Our idea is intuitive but proven to be very effective, especially in addressing some key limitations of GAN. In essence, it combines the Kullback-Leibler (KL) and reverse KL divergences into a unified objective function, thus it exploits the complementary statistical properties from these divergences to effectively diversify the estimated density in capturing multi-modes. We term our method dual discriminator generative adversarial nets (D2GAN) which, unlike GAN, has two discriminators; and together with a generator, it also has the analogy of a minimax game, wherein a discriminator rewards high scores for samples from data distribution whilst another discriminator, conversely, favoring data from the generator, and the generator produces data to fool both two discriminators. We develop theoretical analysis to show that, given the maximal discriminators, optimizing the generator of D2GAN reduces to minimizing both KL and reverse KL divergences between data distribution and the distribution induced from the data generated by the generator, hence effectively avoiding the mode collapsing problem. We conduct extensive experiments on synthetic and real-world large-scale datasets (MNIST, CIFAR-10, STL-10, ImageNet), where we have made our best effort to compare our D2GAN with the latest state-of-the-art GAN's variants in comprehensive qualitative and quantitative evaluations. The experimental results demonstrate the competitive and superior performance of our approach in generating good quality and diverse samples over baselines, and the capability of our method to scale up to ImageNet database. },
        ACMID = { 3295027 },
        FILE = { :tu_etal_nips17_d2gan - Dual Discriminator Generative Adversarial Nets.pdf:PDF },
        ISBN = { 978-1-5108-6096-4 },
        LOCATION = { Long Beach, California, USA },
        NUMPAGES = { 11 },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2017.09.06 },
        URL = { http://dl.acm.org/citation.cfm?id=3294996.3295027 },
    }
C
  • GoGP: Fast Online Regression with Gaussian Processes
    Trung Le, Khanh Nguyen, Vu Nguyen, Tu Dinh Nguyen and Dinh Phung. In International Conference on Data Mining (ICDM), 2017. [ | ]
    One of the most current challenging problems in Gaussian process regression (GPR) is to handle large-scale datasets and to accommodate an online learning setting where data arrive irregularly on the fly. In this paper, we introduce a novel online Gaussian process model that could scale with massive datasets. Our approach is formulated based on alternative representation of the Gaussian process under geometric and optimization views, hence termed geometric-based online GP (GoGP). We developed theory to guarantee that with a good convergence rate our proposed algorithm always produces a (sparse) solution which is close to the true optima to any arbitrary level of approximation accuracy specified a priori. Furthermore, our method is proven to scale seamlessly not only with large-scale datasets, but also to adapt accurately with streaming data. We extensively evaluated our proposed model against state-of-the-art baselines using several large-scale datasets for online regression task. The experimental results show that our GoGP delivered comparable, or slightly better, predictive performance while achieving a magnitude of computational speedup compared withits rivals under online setting. More importantly, its convergence behavior is guaranteed through our theoretical analysis, which is rapid and stable while achieving lower errors.
    @INPROCEEDINGS { le_etal_icdm17_gogp,
        AUTHOR = { Trung Le and Khanh Nguyen and Vu Nguyen and Tu Dinh Nguyen and Dinh Phung },
        TITLE = { {GoGP}: Fast Online Regression with Gaussian Processes },
        BOOKTITLE = { International Conference on Data Mining (ICDM) },
        YEAR = { 2017 },
        ABSTRACT = { One of the most current challenging problems in Gaussian process regression (GPR) is to handle large-scale datasets and to accommodate an online learning setting where data arrive irregularly on the fly. In this paper, we introduce a novel online Gaussian process model that could scale with massive datasets. Our approach is formulated based on alternative representation of the Gaussian process under geometric and optimization views, hence termed geometric-based online GP (GoGP). We developed theory to guarantee that with a good convergence rate our proposed algorithm always produces a (sparse) solution which is close to the true optima to any arbitrary level of approximation accuracy specified a priori. Furthermore, our method is proven to scale seamlessly not only with large-scale datasets, but also to adapt accurately with streaming data. We extensively evaluated our proposed model against state-of-the-art baselines using several large-scale datasets for online regression task. The experimental results show that our GoGP delivered comparable, or slightly better, predictive performance while achieving a magnitude of computational speedup compared withits rivals under online setting. More importantly, its convergence behavior is guaranteed through our theoretical analysis, which is rapid and stable while achieving lower errors. },
        FILE = { :le_etal_icdm17_gogp - GoGP_ Fast Online Regression with Gaussian Processes.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2017.09.01 },
    }
C
  • Supervised Restricted Boltzmann Machines
    Tu Dinh Nguyen, Dinh Phung, Viet Huynh and Trung Le. In In Proc. of the International Conference on Uncertainty in Artificial Intelligence (UAI), 2017. [ | | pdf]
    We propose in this paper the supervised re-stricted Boltzmann machine (sRBM), a unified framework which combines the versatility of RBM to simultaneously learn the data representation and to perform supervised learning (i.e., a nonlinear classifier or a nonlinear regressor). Unlike the current state-of-the-art classification formulation proposed for RBM in (Larochelle et al., 2012), our model is a hybrid probabilistic graphical model consisting of a distinguished genera-tive component for data representation and a dis-criminative component for prediction. While the work of (Larochelle et al., 2012) typically incurs no extra difficulty in inference compared with a standard RBM, our discriminative component, modeled as a directed graphical model, renders MCMC-based inference (e.g., Gibbs sampler) very slow and unpractical for use. To this end, we further develop scalable variational inference for the proposed sRBM for both classification and regression cases. Extensive experiments on realworld datasets show that our sRBM achieves better predictive performance than baseline methods. At the same time, our proposed framework yields learned representations which are more discriminative, hence interpretable, than those of its counterparts. Besides, our method is probabilistic and capable of generating meaningful data conditioning on specific classes – a topic which is of current great interest in deep learning aiming at data generation.
    @INPROCEEDINGS { nguyen_etal_uai17supervised,
        AUTHOR = { Tu Dinh Nguyen and Dinh Phung and Viet Huynh and Trung Le },
        TITLE = { Supervised Restricted Boltzmann Machines },
        BOOKTITLE = { In Proc. of the International Conference on Uncertainty in Artificial Intelligence (UAI) },
        YEAR = { 2017 },
        ABSTRACT = { We propose in this paper the supervised re-stricted Boltzmann machine (sRBM), a unified framework which combines the versatility of RBM to simultaneously learn the data representation and to perform supervised learning (i.e., a nonlinear classifier or a nonlinear regressor). Unlike the current state-of-the-art classification formulation proposed for RBM in (Larochelle et al., 2012), our model is a hybrid probabilistic graphical model consisting of a distinguished genera-tive component for data representation and a dis-criminative component for prediction. While the work of (Larochelle et al., 2012) typically incurs no extra difficulty in inference compared with a standard RBM, our discriminative component, modeled as a directed graphical model, renders MCMC-based inference (e.g., Gibbs sampler) very slow and unpractical for use. To this end, we further develop scalable variational inference for the proposed sRBM for both classification and regression cases. Extensive experiments on realworld datasets show that our sRBM achieves better predictive performance than baseline methods. At the same time, our proposed framework yields learned representations which are more discriminative, hence interpretable, than those of its counterparts. Besides, our method is probabilistic and capable of generating meaningful data conditioning on specific classes – a topic which is of current great interest in deep learning aiming at data generation. },
        FILE = { :nguyen_etal_uai17supervised - Supervised Restricted Boltzmann Machines.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2017.08.29 },
        URL = { http://auai.org/uai2017/proceedings/papers/106.pdf },
    }
C
  • Multilevel clustering via Wasserstein means
    Nhat Ho, XuanLong Nguyen, Mikhail Yurochkin, Hung Bui, Viet Huynh and Dinh Phung. In Proc. of the 34th Internaltional Conference on Machine Learning (ICML), pages 1501-1509, 2017. [ | | pdf]
    We propose a novel approach to the problem of multilevel clustering, which aims to simultaneously partition data in each group and discover grouping patterns among groups in a large hierarchically structural corpus of data. Our method involves a joint optimization formulation over several spaces of discrete probability measures, which are endowed with the Wasserstein distance metric. We propose a number of variants of this problem, which admit fast optimization algorithms, by exploiting the connection to the problem of finding Wasserstein barycenters. We also establish consistency properties enjoyed by our estimates of both local and global clusters. Finally, we present experiment results with both synthetic and real data to demonstrate the flexibility and scalability of the proposed approach.
    @INPROCEEDINGS { ho_etal_icml17multilevel,
        AUTHOR = { Nhat Ho and XuanLong Nguyen and Mikhail Yurochkin and Hung Bui and Viet Huynh and Dinh Phung },
        TITLE = { Multilevel clustering via {W}asserstein means },
        BOOKTITLE = { Proc. of the 34th Internaltional Conference on Machine Learning (ICML) },
        YEAR = { 2017 },
        VOLUME = { 70 },
        SERIES = { ICML'17 },
        PAGES = { 1501--1509 },
        PUBLISHER = { JMLR.org },
        ABSTRACT = { We propose a novel approach to the problem of multilevel clustering, which aims to simultaneously partition data in each group and discover grouping patterns among groups in a large hierarchically structural corpus of data. Our method involves a joint optimization formulation over several spaces of discrete probability measures, which are endowed with the Wasserstein distance metric. We propose a number of variants of this problem, which admit fast optimization algorithms, by exploiting the connection to the problem of finding Wasserstein barycenters. We also establish consistency properties enjoyed by our estimates of both local and global clusters. Finally, we present experiment results with both synthetic and real data to demonstrate the flexibility and scalability of the proposed approach. },
        ACMID = { 3305536 },
        FILE = { :ho_etal_icml17multilevel - Multilevel Clustering Via Wasserstein Means.pdf:PDF },
        LOCATION = { Sydney, NSW, Australia },
        NUMPAGES = { 9 },
        URL = { http://dl.acm.org/citation.cfm?id=3305381.3305536 },
    }
C
  • Approximation Vector Machines for Large-scale Online Learning
    Trung Le, Tu Dinh Nguyen, Vu Nguyen and Dinh Q. Phung. Journal of Machine Learning Research (JMLR), 2017. [ | | pdf]
    One of the most challenging problems in kernel online learning is to bound the model size and to promote the model sparsity. Sparse models not only improve computation and memory usage, but also enhance the generalization capacity, a principle that concurs with the law of parsimony. However, inappropriate sparsity modeling may also significantly degrade the performance. In this paper, we propose Approximation Vector Machine (AVM), a model that can simultaneously encourage the sparsity and safeguard its risk in compromising the performance. When an incoming instance arrives, we approximate this instance by one of its neighbors whose distance to it is less than a predefined threshold. Our key intuition is that since the newly seen instance is expressed by its nearby neighbor the optimal performance can be analytically formulated and maintained. We develop theoretical foundations to support this intuition and further establish an analysis to characterize the gap between the approximation and optimal solutions. This gap crucially depends on the frequency of approximation and the predefined threshold. We perform the convergence analysis for a wide spectrum of loss functions including Hinge, smooth Hinge, and Logistic for classification task, and l1, l2, and ϵ-insensitive for regression task. We conducted extensive experiments for classification task in batch and online modes, and regression task in online mode over several benchmark datasets. The results show that our proposed AVM achieved a comparable predictive performance with current state-of-the-art methods while simultaneously achieving significant computational speed-up due to the ability of the proposed AVM in maintaining the model size.
    @ARTICLE { le_etal_jmlr17approximation,
        AUTHOR = { Trung Le and Tu Dinh Nguyen and Vu Nguyen and Dinh Q. Phung },
        TITLE = { Approximation Vector Machines for Large-scale Online Learning },
        JOURNAL = { Journal of Machine Learning Research (JMLR) },
        YEAR = { 2017 },
        ABSTRACT = { One of the most challenging problems in kernel online learning is to bound the model size and to promote the model sparsity. Sparse models not only improve computation and memory usage, but also enhance the generalization capacity, a principle that concurs with the law of parsimony. However, inappropriate sparsity modeling may also significantly degrade the performance. In this paper, we propose Approximation Vector Machine (AVM), a model that can simultaneously encourage the sparsity and safeguard its risk in compromising the performance. When an incoming instance arrives, we approximate this instance by one of its neighbors whose distance to it is less than a predefined threshold. Our key intuition is that since the newly seen instance is expressed by its nearby neighbor the optimal performance can be analytically formulated and maintained. We develop theoretical foundations to support this intuition and further establish an analysis to characterize the gap between the approximation and optimal solutions. This gap crucially depends on the frequency of approximation and the predefined threshold. We perform the convergence analysis for a wide spectrum of loss functions including Hinge, smooth Hinge, and Logistic for classification task, and l1, l2, and ϵ-insensitive for regression task. We conducted extensive experiments for classification task in batch and online modes, and regression task in online mode over several benchmark datasets. The results show that our proposed AVM achieved a comparable predictive performance with current state-of-the-art methods while simultaneously achieving significant computational speed-up due to the ability of the proposed AVM in maintaining the model size. },
        FILE = { :le_etal_jmlr17approximation - Approximation Vector Machines for Large Scale Online Learning.pdf:PDF },
        KEYWORDS = { kernel, online learning, large-scale machine learning, sparsity, big data, core set, stochastic gradient descent, convergence analysis },
        URL = { https://arxiv.org/abs/1604.06518 },
    }
J
  • Discriminative Bayesian Nonparametric Clustering
    Vu Nguyen, Dinh Phung, Trung Le, Svetha Venkatesh and Hung Bui. In Proc. of International Joint Conference on Artificial Intelligence (IJCAI), 2017. [ | | pdf]
    We propose a general framework for discriminative Bayesian nonparametric clustering to promote the inter-discrimination among the learned clusters in a fully Bayesian nonparametric (BNP) manner. Our method combines existing BNP clustering and discriminative models by enforcing latent cluster indices to be consistent with the predicted labels resulted from probabilistic discriminative model. This formulation results in a well-defined generative process wherein we can use either logistic regression or SVM for discrimination. Using the proposed framework, we develop two novel discriminative BNP variants: the discriminative Dirichlet process mixtures, and the discriminative-state infinite HMMs for sequential data. We develop efficient data-augmentation Gibbs samplers for posterior inference. Extensive experiments in image clustering and dynamic location clustering demonstrate that by encouraging discrimination between induced clusters, our model enhances the quality of clustering in comparison with the traditional generative BNP models.
    @INPROCEEDINGS { nguyen_etal_ijcai17discriminative,
        AUTHOR = { Vu Nguyen and Dinh Phung and Trung Le and Svetha Venkatesh and Hung Bui },
        TITLE = { Discriminative Bayesian Nonparametric Clustering },
        BOOKTITLE = { Proc. of International Joint Conference on Artificial Intelligence (IJCAI) },
        YEAR = { 2017 },
        ABSTRACT = { We propose a general framework for discriminative Bayesian nonparametric clustering to promote the inter-discrimination among the learned clusters in a fully Bayesian nonparametric (BNP) manner. Our method combines existing BNP clustering and discriminative models by enforcing latent cluster indices to be consistent with the predicted labels resulted from probabilistic discriminative model. This formulation results in a well-defined generative process wherein we can use either logistic regression or SVM for discrimination. Using the proposed framework, we develop two novel discriminative BNP variants: the discriminative Dirichlet process mixtures, and the discriminative-state infinite HMMs for sequential data. We develop efficient data-augmentation Gibbs samplers for posterior inference. Extensive experiments in image clustering and dynamic location clustering demonstrate that by encouraging discrimination between induced clusters, our model enhances the quality of clustering in comparison with the traditional generative BNP models. },
        FILE = { :nguyen_etal_ijcai17discriminative - Discriminative Bayesian Nonparametric Clustering.pdf:PDF },
        URL = { https://www.ijcai.org/proceedings/2017/355 },
    }
C
  • Large-scale Online Kernel Learning with Random Feature Reparameterization
    Tu Dinh Nguyen, Trung Le, Hung Bui and Dinh Phung. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI), 2017. [ | | pdf]
    A typical online kernel learning method faces two fundamental issues: the complexity in dealing with a huge number of observed data points (a.k.a the curse of kernelization) and the difficulty in learning kernel parameters, which often assumed to be fixed. Random Fourier feature is a recent and effective approach to address the former by approximating the shift-invariant kernel function via Bocher’s theorem, and allows the model to be maintained directly in the random feature space with a fixed dimension, hence the model size remains constant w.r.t. data size. We further introduce in this paper the reparameterized random feature (RRF), a random feature framework for large-scale online kernel learning to address both aforementioned challenges. Our initial intuition comes from the so-called ‘reparameterization trick’ [Kingma and Welling, 2014] to lift the source of randomness of Fourier components to another space which can be independently sampled, so that stochastic gradient of the kernel parameters can be analytically derived. We develop a well-founded underlying theory for our method, including a general way to reparameterize the kernel, and a new tighter error bound on the approximation quality. This view further inspires a direct application of stochastic gradient descent for updating our model under an online learning setting. We then conducted extensive experiments on several large-scale datasets where we demonstrate that our work achieves state-of-the-art performance in both learning efficacy and efficiency.
    @INPROCEEDINGS { tu_etal_ijcai17_rrf,
        AUTHOR = { Tu Dinh Nguyen and Trung Le and Hung Bui and Dinh Phung },
        BOOKTITLE = { Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI) },
        TITLE = { Large-scale Online Kernel Learning with Random Feature Reparameterization },
        YEAR = { 2017 },
        SERIES = { IJCAI'17 },
        ABSTRACT = { A typical online kernel learning method faces two fundamental issues: the complexity in dealing with a huge number of observed data points (a.k.a the curse of kernelization) and the difficulty in learning kernel parameters, which often assumed to be fixed. Random Fourier feature is a recent and effective approach to address the former by approximating the shift-invariant kernel function via Bocher’s theorem, and allows the model to be maintained directly in the random feature space with a fixed dimension, hence the model size remains constant w.r.t. data size. We further introduce in this paper the reparameterized random feature (RRF), a random feature framework for large-scale online kernel learning to address both aforementioned challenges. Our initial intuition comes from the so-called ‘reparameterization trick’ [Kingma and Welling, 2014] to lift the source of randomness of Fourier components to another space which can be independently sampled, so that stochastic gradient of the kernel parameters can be analytically derived. We develop a well-founded underlying theory for our method, including a general way to reparameterize the kernel, and a new tighter error bound on the approximation quality. This view further inspires a direct application of stochastic gradient descent for updating our model under an online learning setting. We then conducted extensive experiments on several large-scale datasets where we demonstrate that our work achieves state-of-the-art performance in both learning efficacy and efficiency. },
        FILE = { :tu_etal_ijcai17_rrf - Large Scale Online Kernel Learning with Random Feature Reparameterization.pdf:PDF },
        LOCATION = { Melbourne, Australia },
        NUMPAGES = { 7 },
        URL = { https://www.ijcai.org/proceedings/2017/354 },
    }
C
  • Hierarchical semi-Markov conditional random fields for deep recursive sequential data
    Truyen Tran, Dinh Phung, Hung H. Bui and Svetha Venkatesh. Artificial Intelligence (AIJ), Feb. 2017. [ | | pdf]
    We present the hierarchical semi-Markov conditional random field (HSCRF), a generalisation of linear-chain conditional random fields to model deep nested Markov processes. It is parameterised in a discriminative framework and has polynomial time algorithms for learning and inference. Importantly, we consider partially-supervised learning and propose algorithms for generalised partially-supervised learning and constrained inference. We develop numerical scaling procedures that handle the overflow problem. We show that the HSCRF can be reduced to the semi-Markov conditional random fields. Finally, we demonstrate the HSCRF in two applications: (i) recognising human activities of daily living (ADLs) from indoor surveillance cameras, and (ii) noun-phrase chunking. The HSCRF is capable of learning rich hierarchical models with reasonable accuracy in both fully and partially observed data cases.
    @ARTICLE { tran_etal_aij17hierarchical,
        AUTHOR = { Truyen Tran and Dinh Phung and Hung H. Bui and Svetha Venkatesh },
        TITLE = { Hierarchical semi-Markov conditional random fields for deep recursive sequential data },
        JOURNAL = { Artificial Intelligence (AIJ) },
        YEAR = { 2017 },
        MONTH = { Feb. },
        ABSTRACT = { We present the hierarchical semi-Markov conditional random field (HSCRF), a generalisation of linear-chain conditional random fields to model deep nested Markov processes. It is parameterised in a discriminative framework and has polynomial time algorithms for learning and inference. Importantly, we consider partially-supervised learning and propose algorithms for generalised partially-supervised learning and constrained inference. We develop numerical scaling procedures that handle the overflow problem. We show that the HSCRF can be reduced to the semi-Markov conditional random fields. Finally, we demonstrate the HSCRF in two applications: (i) recognising human activities of daily living (ADLs) from indoor surveillance cameras, and (ii) noun-phrase chunking. The HSCRF is capable of learning rich hierarchical models with reasonable accuracy in both fully and partially observed data cases. },
        FILE = { :tran_etal_aij17hierarchical - Hierarchical Semi Markov Conditional Random Fields for Deep Recursive Sequential Data.pdf:PDF },
        KEYWORDS = { Deep nested sequential processes, Hierarchical semi-Markov conditional random field, Partial labelling, Constrained inference, Numerical scaling },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2017.02.21 },
        URL = { http://www.sciencedirect.com/science/article/pii/S0004370217300231 },
    }
J
  • See my thesis (chapter 5) for for an equivalent directed graphical model, which is the precusor of this work and where I had described the Assymetric Inside-Outside (AIO) algorithm in great detail. A brief version of this for directed case has also appeared in this AAAI'04's paper. The idea of semi-Markov duration modelling has also been addressed for directed case in these CVPR05 and AIJ09 papers.
  • Column Networks for Collective Classification
    Pham, Trang, Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. In The Thirty-First AAAI Conference on Artificial Intelligence (AAAI), 2017. [ | | pdf]
    Relational learning deals with data that are characterized by relational structures. An important task is collective classification, which is to jointly classify networked objects. While it holds a great promise to produce a better accuracy than non-collective classifiers, collective classification is computational challenging and has not leveraged on the recent breakthroughs of deep learning. We present Column Network (CLN), a novel deep learning model for collective classification in multi-relational domains. CLN has many desirable theoretical properties: (i) it encodes multi-relations between any two instances; (ii) it is deep and compact, allowing complex functions to be approximated at the network level with a small set of free parameters; (iii) local and relational features are learned simultaneously; (iv) long-range, higher-order dependencies between instances are supported naturally; and (v) crucially, learning and inference are efficient, linear in the size of the network and the number of relations. We evaluate CLN on multiple real-world applications: (a) delay prediction in software projects, (b) PubMed Diabetes publication classification and (c) film genre classification. In all applications, CLN demonstrates a higher accuracy than state-of-the-art rivals.
    @CONFERENCE { pham_etal_aaai17column,
        AUTHOR = { Pham, Trang and Tran, Truyen and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { Column Networks for Collective Classification },
        BOOKTITLE = { The Thirty-First AAAI Conference on Artificial Intelligence (AAAI) },
        YEAR = { 2017 },
        ABSTRACT = { Relational learning deals with data that are characterized by relational structures. An important task is collective classification, which is to jointly classify networked objects. While it holds a great promise to produce a better accuracy than non-collective classifiers, collective classification is computational challenging and has not leveraged on the recent breakthroughs of deep learning. We present Column Network (CLN), a novel deep learning model for collective classification in multi-relational domains. CLN has many desirable theoretical properties: (i) it encodes multi-relations between any two instances; (ii) it is deep and compact, allowing complex functions to be approximated at the network level with a small set of free parameters; (iii) local and relational features are learned simultaneously; (iv) long-range, higher-order dependencies between instances are supported naturally; and (v) crucially, learning and inference are efficient, linear in the size of the network and the number of relations. We evaluate CLN on multiple real-world applications: (a) delay prediction in software projects, (b) PubMed Diabetes publication classification and (c) film genre classification. In all applications, CLN demonstrates a higher accuracy than state-of-the-art rivals. },
        COMMENT = { Accepted },
        FILE = { :pham_etal_aaai17column - Column Networks for Collective Classification.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.11.14 },
        URL = { https://arxiv.org/abs/1609.04508 },
    }
C
  • Dual Space Gradient Descent for Online Learning
    Le, Trung, Nguyen, Tu Dinh, Nguyen, Vu and Phung, Dinh. In Advances in Neural Information Processing (NIPS), December 2016. [ | | pdf]
    One crucial goal in kernel online learning is to bound the model size. Common approaches employ budget maintenance procedures to restrict the model sizes using removal, projection, or merging strategies. Although projection and merging, in the literature, are known to be the most effective strategies, they demand extensive computation whilst removal strategy fails to retain information of the removed vectors. An alternative way to address the model size problem is to apply random features to approximate the kernel function. This allows the model to be maintained directly in the random feature space, hence effectively resolve the curse of kernelization. However, this approach still suffers from a serious shortcoming as it needs to use a high dimensional random feature space to achieve a sufficiently accurate kernel approximation. Consequently, it leads to a significant increase in the computational cost. To address all of these aforementioned challenges, we present in this paper the Dual Space Gradient Descent (DualSGD), a novel framework that utilizes random features as an auxiliary space to maintain information from data points removed during budget maintenance. Consequently, our approach permits the budget to be maintained in a simple, direct and elegant way while simultaneously mitigating the impact of the dimensionality issue on learning performance. We further provide convergence analysis and extensively conduct experiments on five real-world datasets to demonstrate the predictive performance and scalability of our proposed method in comparison with the state-of-the-art baselines.
    @CONFERENCE { le_etal_nips16dual,
        AUTHOR = { Le, Trung and Nguyen, Tu Dinh and Nguyen, Vu and Phung, Dinh },
        TITLE = { Dual Space Gradient Descent for Online Learning },
        BOOKTITLE = { Advances in Neural Information Processing (NIPS) },
        YEAR = { 2016 },
        MONTH = { December },
        ABSTRACT = { One crucial goal in kernel online learning is to bound the model size. Common approaches employ budget maintenance procedures to restrict the model sizes using removal, projection, or merging strategies. Although projection and merging, in the literature, are known to be the most effective strategies, they demand extensive computation whilst removal strategy fails to retain information of the removed vectors. An alternative way to address the model size problem is to apply random features to approximate the kernel function. This allows the model to be maintained directly in the random feature space, hence effectively resolve the curse of kernelization. However, this approach still suffers from a serious shortcoming as it needs to use a high dimensional random feature space to achieve a sufficiently accurate kernel approximation. Consequently, it leads to a significant increase in the computational cost. To address all of these aforementioned challenges, we present in this paper the Dual Space Gradient Descent (DualSGD), a novel framework that utilizes random features as an auxiliary space to maintain information from data points removed during budget maintenance. Consequently, our approach permits the budget to be maintained in a simple, direct and elegant way while simultaneously mitigating the impact of the dimensionality issue on learning performance. We further provide convergence analysis and extensively conduct experiments on five real-world datasets to demonstrate the predictive performance and scalability of our proposed method in comparison with the state-of-the-art baselines. },
        FILE = { :le_etal_nips16dual - Dual Space Gradient Descent for Online Learning.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.08.16 },
        URL = { https://papers.nips.cc/paper/6560-dual-space-gradient-descent-for-online-learning.pdf },
    }
C
  • Scalable Nonparametric Bayesian Multilevel Clustering
    Viet Huynh, Dinh Phung, Svetha Venkatesh, Xuan-Long Nguyen, Matt Hoffman and Hung Bui. In Proceedings of the 32nd Conference on Uncertainty in Artificial Intelligence (UAI), pages 289-298, June 2016. [ | | pdf]
    @CONFERENCE { huynh_phung_venkatesh_nguyen_hoffman_bui_uai16scalable,
        AUTHOR = { Viet Huynh and Dinh Phung and Svetha Venkatesh and Xuan-Long Nguyen and Matt Hoffman and Hung Bui },
        TITLE = { Scalable Nonparametric {B}ayesian Multilevel Clustering },
        BOOKTITLE = { Proceedings of the 32nd Conference on Uncertainty in Artificial Intelligence (UAI) },
        YEAR = { 2016 },
        MONTH = { June },
        PUBLISHER = { AUAI Pres },
        PAGES = { 289--298 },
        FILE = { :huynh_phung_venkatesh_nguyen_hoffman_bui_uai16scalable - Scalable Nonparametric Bayesian Multilevel Clustering.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.05.09 },
        URL = { http://auai.org/uai2016/proceedings/papers/262.pdf },
    }
C
  • Budgeted Semi-supervised Support Vector Machine
    Le, Trung, Duong, Phuong, Dinh, Mi, Nguyen, Tu, Nguyen, Vu and Phung, Dinh. In 32nd Conference on Uncertainty in Artificial Intelligence (UAI), June 2016. [ | | pdf]
    @CONFERENCE { le_duong_dinh_nguyen_nguyen_phung_uai16budgeted,
        AUTHOR = { Le, Trung and Duong, Phuong and Dinh, Mi and Nguyen, Tu and Nguyen, Vu and Phung, Dinh },
        TITLE = { Budgeted Semi-supervised {S}upport {V}ector {M}achine },
        BOOKTITLE = { 32nd Conference on Uncertainty in Artificial Intelligence (UAI) },
        YEAR = { 2016 },
        MONTH = { June },
        FILE = { :le_duong_dinh_nguyen_nguyen_phung_uai16budgeted - Budgeted Semi Supervised Support Vector Machine.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.05.09 },
        URL = { http://auai.org/uai2016/proceedings/papers/110.pdf },
    }
C
  • Nonparametric Budgeted Stochastic Gradient Descent
    Le, Trung, Nguyen, Vu, Nguyen, Tu Dinh and Phung, Dinh. In 19th Intl. Conf. on Artificial Intelligence and Statistics (AISTATS), May 2016. [ | | pdf]
    @CONFERENCE { le_nguyen_phung_aistats16nonparametric,
        AUTHOR = { Le, Trung and Nguyen, Vu and Nguyen, Tu Dinh and Phung, Dinh },
        TITLE = { Nonparametric Budgeted Stochastic Gradient Descent },
        BOOKTITLE = { 19th Intl. Conf. on Artificial Intelligence and Statistics (AISTATS) },
        YEAR = { 2016 },
        MONTH = { May },
        FILE = { :le_nguyen_phung_aistats16nonparametric - Nonparametric Budgeted Stochastic Gradient Descent.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.04.06 },
        URL = { http://www.jmlr.org/proceedings/papers/v51/le16.pdf },
    }
C
  • One-Pass Logistic Regression for Label-Drift and Large-Scale Classification on Distributed Systems
    Nguyen, Vu, Nguyen, Tu Dinh, Le, Trung, Phung, Dinh and Venkatesh, Svetha. In 2016 IEEE 16th International Conference on Data Mining (ICDM), pages 1113-1118, Dec 2016. [ | | pdf | code]
    Logistic regression (LR) for classification is the workhorse in industry, where a set of predefined classes is required. The model, however, fails to work in the case where the class labels are not known in advance, a problem we term label-drift classification. Label-drift classification problem naturally occurs in many applications, especially in the context of streaming settings where the incoming data may contain samples categorized with new classes that have not been previously seen. Additionally, in the wave of big data, traditional LR methods may fail due to their expense of running time. In this paper, we introduce a novel variant of LR, namely one-pass logistic regression (OLR) to offer a principled treatment for label-drift and large-scale classifications. To handle largescale classification for big data, we further extend our OLR to a distributed setting for parallelization, termed sparkling OLR (Spark-OLR). We demonstrate the scalability of our proposed methods on large-scale datasets with more than one hundred million data points. The experimental results show that the predictive performances of our methods are comparable orbetter than those of state-of-the-art baselines whilst the executiontime is much faster at an order of magnitude. In addition, the OLR and Spark-OLR are invariant to data shuffling and have no hyperparameter to tune that significantly benefits data practitioners and overcomes the curse of big data cross-validationto select optimal hyperparameters.
    @CONFERENCE { nguyen_etal_icdm16onepass,
        AUTHOR = { Nguyen, Vu and Nguyen, Tu Dinh and Le, Trung and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { One-Pass Logistic Regression for Label-Drift and Large-Scale Classification on Distributed Systems },
        BOOKTITLE = { 2016 IEEE 16th International Conference on Data Mining (ICDM) },
        YEAR = { 2016 },
        PAGES = { 1113-1118 },
        MONTH = { Dec },
        ABSTRACT = { Logistic regression (LR) for classification is the workhorse in industry, where a set of predefined classes is required. The model, however, fails to work in the case where the class labels are not known in advance, a problem we term label-drift classification. Label-drift classification problem naturally occurs in many applications, especially in the context of streaming settings where the incoming data may contain samples categorized with new classes that have not been previously seen. Additionally, in the wave of big data, traditional LR methods may fail due to their expense of running time. In this paper, we introduce a novel variant of LR, namely one-pass logistic regression (OLR) to offer a principled treatment for label-drift and large-scale classifications. To handle largescale classification for big data, we further extend our OLR to a distributed setting for parallelization, termed sparkling OLR (Spark-OLR). We demonstrate the scalability of our proposed methods on large-scale datasets with more than one hundred million data points. The experimental results show that the predictive performances of our methods are comparable orbetter than those of state-of-the-art baselines whilst the executiontime is much faster at an order of magnitude. In addition, the OLR and Spark-OLR are invariant to data shuffling and have no hyperparameter to tune that significantly benefits data practitioners and overcomes the curse of big data cross-validationto select optimal hyperparameters. },
        CODE = { https://github.com/ntienvu/ICDM2016_OLR },
        DOI = { 10.1109/ICDM.2016.0145 },
        FILE = { :nguyen_etal_icdm16onepass - One Pass Logistic Regression for Label Drift and Large Scale Classification on Distributed Systems.pdf:PDF },
        KEYWORDS = { Big Data;distributed processing;pattern classification;regression analysis;Big Data cross-validation;Spark-OLR;class labels;data shuffling;distributed systems;execution time;label-drift classification problem;large-scale classification;large-scale datasets;one-pass logistic regression;optimal hyperparameter selection;sparkling OLR;Bayes methods;Big data;Context;Data models;Estimation;Industries;Logistics;Apache Spark;Logistic regression;distributed system;label-drift;large-scale classification },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.09.10 },
        URL = { http://ieeexplore.ieee.org/document/7837958/ },
    }
C
  • A Simultaneous Extraction of Context and Community from Pervasive Signals Using Nested Dirichlet Process
    Nguyen, T., Nguyen, V., Salim, F.D., Le, D.V. and Phung, D.. Pervasive and Mobile Computing (PMC), 2016. [ | | pdf]
    Understanding user contexts and group structures plays a central role in pervasive computing. These contexts and community structures are complex to mine from data collected in the wild due to the unprecedented growth of data, noise, uncertainties and complexities. Typical existing approaches would first extract the latent patterns to explain the human dynamics or behaviors and then use them as a way to consistently formulate numerical representations for community detection, often via a clustering method. While being able to capture high-order and complex representations, these two steps are performed separately. More importantly, they face a fundamental difficulty in determining the correct number of latent patterns and communities. This paper presents an approach that seamlessly addresses these challenges to simultaneously discover latent patterns and communities in a unified Bayesian nonparametric framework. Our Simultaneous Extraction of Context and Community (SECC) model roots in the nested Dirichlet process theory which allows nested structure to be built to summarize data at multiple levels. We demonstrate our framework on five datasets where the advantages of the proposed approach are validated.
    @ARTICLE { nguyen_nguyen_flora_le_phung_pmc16simultaneous,
        AUTHOR = { Nguyen, T. and Nguyen, V. and Salim, F.D. and Le, D.V. and Phung, D. },
        TITLE = { A Simultaneous Extraction of Context and Community from Pervasive Signals Using Nested {D}irichlet Process },
        JOURNAL = { Pervasive and Mobile Computing (PMC) },
        YEAR = { 2016 },
        ABSTRACT = { Understanding user contexts and group structures plays a central role in pervasive computing. These contexts and community structures are complex to mine from data collected in the wild due to the unprecedented growth of data, noise, uncertainties and complexities. Typical existing approaches would first extract the latent patterns to explain the human dynamics or behaviors and then use them as a way to consistently formulate numerical representations for community detection, often via a clustering method. While being able to capture high-order and complex representations, these two steps are performed separately. More importantly, they face a fundamental difficulty in determining the correct number of latent patterns and communities. This paper presents an approach that seamlessly addresses these challenges to simultaneously discover latent patterns and communities in a unified Bayesian nonparametric framework. Our Simultaneous Extraction of Context and Community (SECC) model roots in the nested Dirichlet process theory which allows nested structure to be built to summarize data at multiple levels. We demonstrate our framework on five datasets where the advantages of the proposed approach are validated. },
        DOI = { http://dx.doi.org/10.1016/j.pmcj.2016.08.019 },
        FILE = { :nguyen_nguyen_flora_le_phung_pmc16simultaneous - A Simultaneous Extraction of Context and Community from Pervasive Signals Using Nested Dirichlet Process.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.08.17 },
        URL = { http://www.sciencedirect.com/science/article/pii/S1574119216302097 },
    }
J
  • Streaming Variational Inference for Dirichlet Process Mixtures
    Huynh, V., Phung, D. and Venkatesh, S.. In 7th Asian Conference on Machine Learning (ACML), pages 237-252, Nov. 2015. [ | | pdf]
    Bayesian nonparametric models are theoretically suitable to learn streaming data due to their complexity relaxation to the volume of observed data. However, most of the existing variational inference algorithms are not applicable to streaming applications since they re-quire truncation on variational distributions. In this paper, we present two truncation-free variational algorithms, one for mix-membership inference called TFVB (truncation-free variational Bayes), and the other for hard clustering inference called TFME (truncation-free maximization expectation). With these algorithms, we further developed a streaming learning framework for the popular Dirichlet process mixture (DPM) models. Our ex-periments demonstrate the usefulness of our framework in both synthetic and real-world data.
    @INPROCEEDINGS { huynh_phung_venkatesh_15streaming,
        AUTHOR = { Huynh, V. and Phung, D. and Venkatesh, S. },
        TITLE = { Streaming Variational Inference for {D}irichlet {P}rocess {M}ixtures },
        BOOKTITLE = { 7th Asian Conference on Machine Learning (ACML) },
        YEAR = { 2015 },
        PAGES = { 237--252 },
        MONTH = { Nov. },
        ABSTRACT = { Bayesian nonparametric models are theoretically suitable to learn streaming data due to their complexity relaxation to the volume of observed data. However, most of the existing variational inference algorithms are not applicable to streaming applications since they re-quire truncation on variational distributions. In this paper, we present two truncation-free variational algorithms, one for mix-membership inference called TFVB (truncation-free variational Bayes), and the other for hard clustering inference called TFME (truncation-free maximization expectation). With these algorithms, we further developed a streaming learning framework for the popular Dirichlet process mixture (DPM) models. Our ex-periments demonstrate the usefulness of our framework in both synthetic and real-world data. },
        FILE = { :huynh_phung_venkatesh_15streaming - Streaming Variational Inference for Dirichlet Process Mixtures.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.04.06 },
        URL = { http://www.jmlr.org/proceedings/papers/v45/Huynh15.pdf },
    }
C
  • Tensor-variate Restricted Boltzmann Machines
    Nguyen, Tu, Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. In Proc. of AAAI Conf. on Artificial Intelligence (AAAI), pages 2887-2893, Austin Texas, USA , January 2015. [ | | pdf]
    Restricted Boltzmann Machines (RBMs) are an important class of latentvariable models for representing vector data. An under-explored areais multimode data, where each data point is a matrix or a tensor.Standard RBMs applying to such data would require vectorizing matricesand tensors, thus resulting in unnecessarily high dimensionalityand at the same time, destroying the inherent higher-order interactionstructures. This paper introduces Tensor-variate Restricted BoltzmannMachines (TvRBMs) which generalize RBMs to capture the multiplicativeinteraction between data modes and the latent variables. TvRBMs arehighly compact in that the number of free parameters grows only linearwith the number of modes. We demonstrate the capacity of TvRBMs onthree real-world applications: handwritten digit classification,face recognition and EEG-based alcoholic diagnosis. The learnt featuresof the model are more discriminative than the rivals, resulting inbetter classification performance.
    @INPROCEEDINGS { tu_truyen_phung_venkatesh_aaai15,
        TITLE = { Tensor-variate Restricted {B}oltzmann Machines },
        AUTHOR = { Nguyen, Tu and Tran, Truyen and Phung, Dinh and Venkatesh, Svetha },
        BOOKTITLE = { Proc. of AAAI Conf. on Artificial Intelligence (AAAI) },
        YEAR = { 2015 },
        ADDRESS = { Austin Texas, USA },
        MONTH = { January },
        PAGES = { 2887--2893 },
        ABSTRACT = { Restricted Boltzmann Machines (RBMs) are an important class of latentvariable models for representing vector data. An under-explored areais multimode data, where each data point is a matrix or a tensor.Standard RBMs applying to such data would require vectorizing matricesand tensors, thus resulting in unnecessarily high dimensionalityand at the same time, destroying the inherent higher-order interactionstructures. This paper introduces Tensor-variate Restricted BoltzmannMachines (TvRBMs) which generalize RBMs to capture the multiplicativeinteraction between data modes and the latent variables. TvRBMs arehighly compact in that the number of free parameters grows only linearwith the number of modes. We demonstrate the capacity of TvRBMs onthree real-world applications: handwritten digit classification,face recognition and EEG-based alcoholic diagnosis. The learnt featuresof the model are more discriminative than the rivals, resulting inbetter classification performance. },
        KEYWORDS = { tensor; rbm; restricted boltzmann machine; tvrbm; multiplicative interaction; eeg; },
        OWNER = { ngtu },
        TIMESTAMP = { 2015.01.29 },
        URL = { http://www.aaai.org/ocs/index.php/AAAI/AAAI15/paper/view/9371 },
    }
C
  • Learning Latent Activities from Social Signals with Hierarchical Dirichlet Process
    Phung, Dinh, Nguyen, T. C., Gupta, S. and Venkatesh, Svetha. In Handbook on Plan, Activity, and Intent Recognition, pages 149-174.Elsevier, , 2014. [ | | pdf | code]
    Understanding human activities is an important research topic, noticeablyin assisted living and health monitoring. Beyond simple forms ofactivity (e.g., RFID event of entering a building), learning latentactivities that are more semantically interpretable, such as sittingat a desk, meeting with people or gathering with friends, remainsa challenging problem. Supervised learning has been the typical modelingchoice in the past. However, this requires labeled training data, is unable to predict never-seen-before activity and fails to adaptto the continuing growth of data over time. In this chapter, we exploreBayesian nonparametric method, in particular the Hierarchical DirichletProcess, to infer latent activities from sensor data acquired ina pervasive setting. Our framework is unsupervised, requires no labeleddata and is able to discover new activities as data grows. We presentexperiments on extracting movement and interaction activities fromsociometric badge signals and show how to use them for detectionof sub-communities. Using the popular Reality Mining dataset, wefurther demonstrate the extraction of co-location activities and use them to automatically infer the structure of social subgroups.
    @INCOLLECTION { phung_nguyen_gupta_venkatesh_pair14,
        TITLE = { Learning Latent Activities from Social Signals with Hierarchical {D}irichlet Process },
        AUTHOR = { Phung, Dinh and Nguyen, T. C. and Gupta, S. and Venkatesh, Svetha },
        BOOKTITLE = { Handbook on Plan, Activity, and Intent Recognition },
        PUBLISHER = { Elsevier },
        YEAR = { 2014 },
        EDITOR = { Gita Sukthankar and Christopher Geib and David V. Pynadath and HungBui and Robert P. Goldman },
        PAGES = { 149--174 },
        ABSTRACT = { Understanding human activities is an important research topic, noticeablyin assisted living and health monitoring. Beyond simple forms ofactivity (e.g., RFID event of entering a building), learning latentactivities that are more semantically interpretable, such as sittingat a desk, meeting with people or gathering with friends, remainsa challenging problem. Supervised learning has been the typical modelingchoice in the past. However, this requires labeled training data, is unable to predict never-seen-before activity and fails to adaptto the continuing growth of data over time. In this chapter, we exploreBayesian nonparametric method, in particular the Hierarchical DirichletProcess, to infer latent activities from sensor data acquired ina pervasive setting. Our framework is unsupervised, requires no labeleddata and is able to discover new activities as data grows. We presentexperiments on extracting movement and interaction activities fromsociometric badge signals and show how to use them for detectionof sub-communities. Using the popular Reality Mining dataset, wefurther demonstrate the extraction of co-location activities and use them to automatically infer the structure of social subgroups. },
        CODE = { http://prada-research.net/~dinh/index.php?n=Main.Code#HDP_code },
        OWNER = { ctng },
        TIMESTAMP = { 2013.07.25 },
        URL = { http://prada-research.net/~dinh/uploads/Main/Publications/Phung_etal_pair14.pdf },
    }
BC
  • A Random Finite Set Model for Data Clustering
    Phung, Dinh and Vo, Ba-Ngu. In Proc. of Intl. Conf. on Fusion (FUSION), Salamanca, Spain, July 2014. [ | | pdf]
    Abstract--- The goal of data clustering is to partition data pointsinto groups to minimize a given objective function. While most existingclustering algorithms treat each data point as vector, in many applicationseach datum is not a vector but a point pattern or a set of points.Moreover, many existing clustering methods require the user to specifythe number of clusters, which is not available in advance. This paperproposes a new class of models for data clustering that addressesset-valued data as well as unknown number of clusters, using a DirichletProcess mixture of Poisson random finite sets. We also develop anefficient Markov Chain Monte Carlo posterior inference techniquethat can learn the number of clusters and mixture parameters automaticallyfrom the data. Numerical studies are presented to demonstrate thesalient features of this new model, in particular its capacity todiscover extremely unbalanced clusters in data.
    @INPROCEEDINGS { phung_vo_fusion14,
        TITLE = { A Random Finite Set Model for Data Clustering },
        AUTHOR = { Phung, Dinh and Vo, Ba-Ngu },
        BOOKTITLE = { Proc. of Intl. Conf. on Fusion (FUSION) },
        YEAR = { 2014 },
        ADDRESS = { Salamanca, Spain },
        MONTH = { July },
        ABSTRACT = { Abstract--- The goal of data clustering is to partition data pointsinto groups to minimize a given objective function. While most existingclustering algorithms treat each data point as vector, in many applicationseach datum is not a vector but a point pattern or a set of points.Moreover, many existing clustering methods require the user to specifythe number of clusters, which is not available in advance. This paperproposes a new class of models for data clustering that addressesset-valued data as well as unknown number of clusters, using a DirichletProcess mixture of Poisson random finite sets. We also develop anefficient Markov Chain Monte Carlo posterior inference techniquethat can learn the number of clusters and mixture parameters automaticallyfrom the data. Numerical studies are presented to demonstrate thesalient features of this new model, in particular its capacity todiscover extremely unbalanced clusters in data. },
        OWNER = { dinh },
        TIMESTAMP = { 2014.05.16 },
        URL = { http://prada-research.net/~dinh/uploads/Main/Publications/phung_vo_fusion14.pdf },
    }
C
  • Labeled Random Finite Sets and the Bayes Multi-target Tracking Filter
    Vo, Ba-Ngu, Vo, Ba-Tuong and Phung, Dinh. IEEE Transactions on Signal Processing (TSP), 62(24):6554-6567, 2014. [ | ]
    @ARTICLE { vo_vo_phung_tsp14,
        AUTHOR = { Vo, Ba-Ngu and Vo, Ba-Tuong and Phung, Dinh },
        TITLE = { Labeled Random Finite Sets and the Bayes Multi-target Tracking Filter },
        JOURNAL = { IEEE Transactions on Signal Processing (TSP) },
        YEAR = { 2014 },
        VOLUME = { 62 },
        NUMBER = { 24 },
        PAGES = { 6554--6567 },
        FILE = { :vo_vo_phung_tsp14 - Labeled Random Finite Sets and the Bayes Multi Target Tracking Filter.pdf:PDF },
        OWNER = { dinh },
        TIMESTAMP = { 2014.07.02 },
    }
J
  • Bayesian Nonparametric Multilevel Clustering with Group-Level Contexts
    Vu Nguyen, Phung, Dinh, XuanLong Nguyen, Venkatesh, Svetha and Hung Bui. In Proc. of Intl. Conf. on Machine Learning (ICML), pages 288-296, 2014. [ | ]
    @INPROCEEDINGS { nguyen_phung_nguyen_venkatesh_bui_icml14,
        TITLE = { Bayesian Nonparametric Multilevel Clustering with Group-Level Contexts },
        AUTHOR = { Vu Nguyen and Phung, Dinh and XuanLong Nguyen and Venkatesh, Svetha and Hung Bui },
        BOOKTITLE = { Proc. of Intl. Conf. on Machine Learning (ICML) },
        YEAR = { 2014 },
        PAGES = { 288--296 },
        OWNER = { tvnguye },
        TIMESTAMP = { 2013.12.13 },
    }
C
  • Keeping up with Innovation: A Predictive Framework for Modeling Healthcare Data with Evolving Clinical Interventions
    Sunil Kumar Gupta, Santu Rana, Phung, Dinh and Venkatesh, Svetha. In Proc. of SIAM Intl. Conf. on Data Mining (SDM), pages 235-243, 2014. [ | | pdf]
    @INPROCEEDINGS { gupta_rana_phung_venkatesh_sdm14,
        TITLE = { Keeping up with Innovation: A Predictive Framework for Modeling Healthcare Data with Evolving Clinical Interventions },
        AUTHOR = { Sunil Kumar Gupta and Santu Rana and Phung, Dinh and Venkatesh, Svetha },
        BOOKTITLE = { Proc. of SIAM Intl. Conf. on Data Mining (SDM) },
        YEAR = { 2014 },
        PAGES = { 235-243 },
        CHAPTER = { 27 },
        DOI = { 10.1137/1.9781611973440.27 },
        EPRINT = { http://epubs.siam.org/doi/pdf/10.1137/1.9781611973440.27 },
        OWNER = { thinng },
        TIMESTAMP = { 2015.01.28 },
        URL = { http://epubs.siam.org/doi/abs/10.1137/1.9781611973440.27 },
    }
C
  • An Integrated Framework for Suicide Risk Prediction
    Tran, Truyen, Phung, Dinh, Luo, Wei, Harvey,R., Berk,M. and Venkatesh, Svetha. In Proc. of ACM Int. Conf. on Knowledge Discovery and Data Mining (SIGKDD), Chicago, US, 2013. [ | ]
    @INPROCEEDINGS { tran_phung_luo_harvey_berk_venkatesh_kdd13,
        TITLE = { An Integrated Framework for Suicide Risk Prediction },
        AUTHOR = { Tran, Truyen and Phung, Dinh and Luo, Wei and Harvey,R. and Berk,M. and Venkatesh, Svetha },
        BOOKTITLE = { Proc. of ACM Int. Conf. on Knowledge Discovery and Data Mining (SIGKDD) },
        YEAR = { 2013 },
        ADDRESS = { Chicago, US },
        OWNER = { Dinh },
        TIMESTAMP = { 2013.06.07 },
    }
C
  • Thurstonian Boltzmann Machines: Learning from Multiple Inequalities
    Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. In International Conference on Machine Learning (ICML), Atlanta, USA, June 16-21 2013. [ | ]
    We introduce Thurstonian Boltzmann Machines (TBM), a unified architecture that can naturally incorporate a wide range of data inputs at the same time. It is based on the observations that many data types can be considered as being generated from a subset of underlying continuous variables, and each input value signifies a several respective inequalities. Thus learning TBM is essentially learning to make sense of a set of inequalities. The TBM supports the following types naturally: Gaussian, intervals, censored, binary, categorical, multi-categorical, ordinal, (in)-complete rank with and without ties. We demonstrate the versatility and capacity of the proposed model on three applications of very different natures namely handwritten digit recognitions, collaborative filtering and complex survey analysis.
    @INPROCEEDINGS { tran_phung_venkatesh_icml13,
        TITLE = { {T}hurstonian {B}oltzmann Machines: Learning from Multiple Inequalities },
        AUTHOR = { Tran, Truyen and Phung, Dinh and Venkatesh, Svetha },
        BOOKTITLE = { International Conference on Machine Learning (ICML) },
        YEAR = { 2013 },
        ADDRESS = { Atlanta, USA },
        MONTH = { June 16-21 },
        ABSTRACT = { We introduce Thurstonian Boltzmann Machines (TBM), a unified architecture that can naturally incorporate a wide range of data inputs at the same time. It is based on the observations that many data types can be considered as being generated from a subset of underlying continuous variables, and each input value signifies a several respective inequalities. Thus learning TBM is essentially learning to make sense of a set of inequalities. The TBM supports the following types naturally: Gaussian, intervals, censored, binary, categorical, multi-categorical, ordinal, (in)-complete rank with and without ties. We demonstrate the versatility and capacity of the proposed model on three applications of very different natures namely handwritten digit recognitions, collaborative filtering and complex survey analysis. },
        OWNER = { dinh },
        TIMESTAMP = { 2013.03.01 },
    }
C
  • Factorial Multi-Task Learning : A Bayesian Nonparametric Approach
    Gupta, Sunil, Phung, Dinh and Venkatesh, Svetha. In Proceedings of International Conference on Machine Learning (ICML), Atlanta, USA, June 16-21 2013. [ | ]
    Multi-task learning is a paradigm shown to improve the performance of related tasks through their joint learning. However, for real-world data, it is usually difficult to assess the task relatedness and joint learning with unrelated tasks may lead to serious performance degradations. To this end, we propose a framework that groups the tasks based on their relatedness in a low dimensional subspace and allows a varying degree of relatedness among tasks by sharing the subspace bases across the groups. This provides the flexibility of no sharing when two sets of tasks are unrelated and partial/total sharing when the tasks are related. Importantly, the number of task-groups and the subspace dimensionality are automatically inferred from the data. This feature keeps the model beyond a specific set of parameters. To realize our framework, we present a novel Bayesian nonparametric prior that extends the traditional hierarchical beta process prior using a Dirichlet process to permit potentially infinite number of child beta processes. We apply our model for multi-task regression and classification applications. Experimental results using several synthetic and real-world datasets show the superiority of our model to other recent state-of-the-art multi-task learning methods.
    @INPROCEEDINGS { gupta_phung_venkatesh_icml13,
        TITLE = { Factorial Multi-Task Learning : A Bayesian Nonparametric Approach },
        AUTHOR = { Gupta, Sunil and Phung, Dinh and Venkatesh, Svetha },
        BOOKTITLE = { Proceedings of International Conference on Machine Learning (ICML) },
        YEAR = { 2013 },
        ADDRESS = { Atlanta, USA },
        MONTH = { June 16-21 },
        ABSTRACT = { Multi-task learning is a paradigm shown to improve the performance of related tasks through their joint learning. However, for real-world data, it is usually difficult to assess the task relatedness and joint learning with unrelated tasks may lead to serious performance degradations. To this end, we propose a framework that groups the tasks based on their relatedness in a low dimensional subspace and allows a varying degree of relatedness among tasks by sharing the subspace bases across the groups. This provides the flexibility of no sharing when two sets of tasks are unrelated and partial/total sharing when the tasks are related. Importantly, the number of task-groups and the subspace dimensionality are automatically inferred from the data. This feature keeps the model beyond a specific set of parameters. To realize our framework, we present a novel Bayesian nonparametric prior that extends the traditional hierarchical beta process prior using a Dirichlet process to permit potentially infinite number of child beta processes. We apply our model for multi-task regression and classification applications. Experimental results using several synthetic and real-world datasets show the superiority of our model to other recent state-of-the-art multi-task learning methods. },
        OWNER = { Dinh },
        TIMESTAMP = { 2013.04.16 },
    }
C
  • Sparse Subspace Clustering via Group Sparse Coding
    Saha, B., Pham, D.S., Phung, Dinh and Venkatesh, Svetha. In Proc. of SIAM Intl. Conf. on Data Mining (SDM), pages 130-138, 2013. [ | ]
    Sparse subspace representation is an emerging and powerful approach for clustering of data, whose generative model is a union of subspaces. Existing sparse subspace representation methods are restricted to the single-task setting, which consequently leads to inefficient computation and sub-optimal performance. To address the current limitation, we propose a novel method that regularizes sparse subspace representation by exploiting the structural sharing between tasks and data points. The first regularizer aims at group level where we seek sparsity between groups but dense within group. The second regularizer models the interactions down to data point level via the well-known graph regularization technique. We also derive simple, provably convergent, and extremely computationally efficient algorithms for solving the proposed group formulations. We evaluate the proposed methods over a wide range of large-scale clustering problems: from challenging health care to image and text clustering benchmarks datasets and show that they outperform state-of-the-art considerably.
    @INPROCEEDINGS { saha_pham_phung_venkatesh_sdm13,
        TITLE = { Sparse Subspace Clustering via Group Sparse Coding },
        AUTHOR = { Saha, B. and Pham, D.S. and Phung, Dinh and Venkatesh, Svetha },
        BOOKTITLE = { Proc. of SIAM Intl. Conf. on Data Mining (SDM) },
        YEAR = { 2013 },
        PAGES = { 130-138 },
        ABSTRACT = { Sparse subspace representation is an emerging and powerful approach for clustering of data, whose generative model is a union of subspaces. Existing sparse subspace representation methods are restricted to the single-task setting, which consequently leads to inefficient computation and sub-optimal performance. To address the current limitation, we propose a novel method that regularizes sparse subspace representation by exploiting the structural sharing between tasks and data points. The first regularizer aims at group level where we seek sparsity between groups but dense within group. The second regularizer models the interactions down to data point level via the well-known graph regularization technique. We also derive simple, provably convergent, and extremely computationally efficient algorithms for solving the proposed group formulations. We evaluate the proposed methods over a wide range of large-scale clustering problems: from challenging health care to image and text clustering benchmarks datasets and show that they outperform state-of-the-art considerably. },
        OWNER = { thinng },
        TIMESTAMP = { 2013.01.07 },
    }
C
  • Bayesian Nonparametric Modelling of Correlated Data Sources and Applications (poster)
    Phung, Dinh. In International Conference on Bayesian Nonparametrics, Amsterdam, The Netherlands, June 10-14 2013. [ | | code | poster]
    When one considers realistic multimodal data, covariates are rich, and yet tend to have a natural correlation with one another; for example: tags and their associated multimedia contents; patient's demographic information, medical history and drug usage; social user's pro le and friendship network. The presence of rich and naturally correlated covariates calls for the need to model their correlation with nonparametric models, without reverting to making parametric assumptions. This paper presents a full Bayesian nonparametric approach to the problem of jointly clustering data sources and modeling their correlation. In our approach, we view context as distributions over some index space, governed by the topics discovered from the primary data source (content), and model both contents and contexts jointly. We impose a conditional structure in which contents provide the topics, upon which contexts are conditionally distributed. Distributions over topic parameters are modelled according to a Dirichlet processes (DP). Stick-breaking representation gives rise to explicit realizations of topic atoms which we use as an indexing mechanism to induce conditional random mixture distributions on the context observation spaces. Loosely speaking, we use a stochastic process, being DP, to conditionally `index' other stochastic processes. The later can be designed on any suitable family of stochastic processes to suit modelling needs or data types of contexts (such as Beta or Gaussian processes). Dirichlet process is of course an obvious choice and will be again employed in this work. In typical hierarchical Bayesian style, we also provide the model in grouped data setting, where contents and contexts appear in groups (for example, a collection of text documents or images embedded in time or space). Our model can be viewed as a generalization of the hierarchical Dirichlet process (HDP) [2] and the recent nested Dirichlet process (nDP) [1]. We develop an auxiliary conditional Gibbs sampling in which both topic and context atoms are marginalized out. We demonstrate the framework on synthesis datasets and various real large-scale datasets with an emphasis on the application perspective of the models. In particular, we demonstrate a) an application in text modelling for modelling topics which are sensitive to time using the NIPS and PNAS dataset, b) an application of the model in computer vision for inferring local and global movement patterns using the MIT dataset consisting of real video data collected at a trac scene, c) an application on medical data analysis in which we model latent aspects of diseases, their progression together with the task of re-admission prediction.
    @INPROCEEDINGS { phung_bnp13,
        TITLE = { {B}ayesian Nonparametric Modelling of Correlated Data Sources and Applications (poster) },
        AUTHOR = { Phung, Dinh },
        BOOKTITLE = { International Conference on Bayesian Nonparametrics },
        YEAR = { 2013 },
        ADDRESS = { Amsterdam, The Netherlands },
        MONTH = { June 10-14 },
        ABSTRACT = { When one considers realistic multimodal data, covariates are rich, and yet tend to have a natural correlation with one another; for example: tags and their associated multimedia contents; patient's demographic information, medical history and drug usage; social user's pro le and friendship network. The presence of rich and naturally correlated covariates calls for the need to model their correlation with nonparametric models, without reverting to making parametric assumptions. This paper presents a full Bayesian nonparametric approach to the problem of jointly clustering data sources and modeling their correlation. In our approach, we view context as distributions over some index space, governed by the topics discovered from the primary data source (content), and model both contents and contexts jointly. We impose a conditional structure in which contents provide the topics, upon which contexts are conditionally distributed. Distributions over topic parameters are modelled according to a Dirichlet processes (DP). Stick-breaking representation gives rise to explicit realizations of topic atoms which we use as an indexing mechanism to induce conditional random mixture distributions on the context observation spaces. Loosely speaking, we use a stochastic process, being DP, to conditionally `index' other stochastic processes. The later can be designed on any suitable family of stochastic processes to suit modelling needs or data types of contexts (such as Beta or Gaussian processes). Dirichlet process is of course an obvious choice and will be again employed in this work. In typical hierarchical Bayesian style, we also provide the model in grouped data setting, where contents and contexts appear in groups (for example, a collection of text documents or images embedded in time or space). Our model can be viewed as a generalization of the hierarchical Dirichlet process (HDP) [2] and the recent nested Dirichlet process (nDP) [1]. We develop an auxiliary conditional Gibbs sampling in which both topic and context atoms are marginalized out. We demonstrate the framework on synthesis datasets and various real large-scale datasets with an emphasis on the application perspective of the models. In particular, we demonstrate a) an application in text modelling for modelling topics which are sensitive to time using the NIPS and PNAS dataset, b) an application of the model in computer vision for inferring local and global movement patterns using the MIT dataset consisting of real video data collected at a trac scene, c) an application on medical data analysis in which we model latent aspects of diseases, their progression together with the task of re-admission prediction. },
        CODE = { http://prada-research.net/~dinh/index.php?n=Main.Code#HDP_code },
        OWNER = { dinh },
        POSTER = { http://prada-research.net/~dinh/uploads/Main/Publications/A0_poster_BNP13.pdf },
        TIMESTAMP = { 2013.03.01 },
    }
C
  • Connectivity, Online Social Capital and Mood: A Bayesian Nonparametric Analysis
    Phung, Dinh, Gupta, S. K., Nguyen, T. and Venkatesh, Svetha. IEEE Transactions on Multimedia (TMM), 15:1316-1325, 2013. [ | | pdf]
    Social capital indicative of community interaction and support is intrinsically linked to mental health. Increasing online presence is now the norm. Whilst social capital and its impact on social networks has been examined, its underlying connection to emotional response such as mood, has not been investigated. This paper studies this phenomena, revisiting the concept of “online social capital” in social media communities using measurable aspects of social participation and social support. We establish the link between online capital derived from social media and mood, demonstrating results for different cohorts of social capital and social connectivity. We use novel Bayesian nonparametric factor analysis to extract the shared and individual factors in mood transition across groups of users of different levels of connectivity, quantifying patterns and degree of mood transitions. Using more than 1.6 million users from LiveJournal, we show quantitatively that groups with lower social capital have fewer positive moods and more negative moods, than groups with higher social capital. We show similar effects in mood transitions. We establish a framework of how social media can be used as a barometer for mood. The significance lies in the importance of online social capital to mental well-being in overall. In establishing the link between mood and social capital in online communities, this work may suggest the foundation of new systems to monitor online mental well-being.
    @ARTICLE { phung_gupta_nguyen_venkatesh_tmm13,
        TITLE = { Connectivity, Online Social Capital and Mood: A Bayesian Nonparametric Analysis },
        AUTHOR = { Phung, Dinh and Gupta, S. K. and Nguyen, T. and Venkatesh, Svetha },
        JOURNAL = { IEEE Transactions on Multimedia (TMM) },
        YEAR = { 2013 },
        PAGES = { 1316-1325 },
        VOLUME = { 15 },
        ABSTRACT = { Social capital indicative of community interaction and support is intrinsically linked to mental health. Increasing online presence is now the norm. Whilst social capital and its impact on social networks has been examined, its underlying connection to emotional response such as mood, has not been investigated. This paper studies this phenomena, revisiting the concept of “online social capital” in social media communities using measurable aspects of social participation and social support. We establish the link between online capital derived from social media and mood, demonstrating results for different cohorts of social capital and social connectivity. We use novel Bayesian nonparametric factor analysis to extract the shared and individual factors in mood transition across groups of users of different levels of connectivity, quantifying patterns and degree of mood transitions. Using more than 1.6 million users from LiveJournal, we show quantitatively that groups with lower social capital have fewer positive moods and more negative moods, than groups with higher social capital. We show similar effects in mood transitions. We establish a framework of how social media can be used as a barometer for mood. The significance lies in the importance of online social capital to mental well-being in overall. In establishing the link between mood and social capital in online communities, this work may suggest the foundation of new systems to monitor online mental well-being. },
        ISSN = { 0219-1377 },
        LANGUAGE = { English },
        TIMESTAMP = { 2013.04.16 },
        URL = { http://prada-research.net/~dinh/uploads/Main/HomePage/phung_gupta_nguyen_venkatesh_tmm13.pdf },
    }
J
  • Regularized nonnegative shared subspace learning
    Gupta, Sunil Kumar, Phung, Dinh, Adams, Brett and Venkatesh, Svetha. Data Mining and Knowledge Discovery, 26(1):57-97, 2013. [ | ]
    @ARTICLE { gupta_phung_adams_venkatesh_dami13,
        TITLE = { Regularized nonnegative shared subspace learning },
        AUTHOR = { Gupta, Sunil Kumar and Phung, Dinh and Adams, Brett and Venkatesh, Svetha },
        JOURNAL = { Data Mining and Knowledge Discovery },
        YEAR = { 2013 },
        NUMBER = { 1 },
        PAGES = { 57--97 },
        VOLUME = { 26 },
        OWNER = { thinng },
        PUBLISHER = { Springer },
        TIMESTAMP = { 2015.01.29 },
    }
J
  • A Slice Sampler for Restricted Hierarchical Beta Process with Applications to Shared Subspace Learning
    Gupta, S., Phung, Dinh and Venkatesh, Svetha. In Proc. of Intl. Conf. on Uncertainty in Artificial Intelligence (UAI), pages 316-325, 2012. [ | ]
    @INPROCEEDINGS { gupta_phung_venkatesh_uai12,
        TITLE = { A Slice Sampler for Restricted Hierarchical {B}eta Process with Applications to Shared Subspace Learning },
        AUTHOR = { Gupta, S. and Phung, Dinh and Venkatesh, Svetha },
        BOOKTITLE = { Proc. of Intl. Conf. on Uncertainty in Artificial Intelligence (UAI) },
        YEAR = { 2012 },
        PAGES = { 316--325 },
        OWNER = { dinh },
        TIMESTAMP = { 2012.05.24 },
    }
C
  • A Bayesian Nonparametric Joint Factor Model for Learning Shared and Individual Subspaces from Multiple Data Sources
    Gupta, S., Phung, Dinh and Venkatesh, Svetha. In Proc. of SIAM Intl. Conf. on Data Mining (SDM), pages 200-211, 2012. [ | ]
    Joint analysis of multiple data sources is becoming increasingly popular in transfer learning, multi-task learning and cross-domain data mining. One promising approach to model the data jointly is through learning the shared and individual factor subspaces. However, performance of this approach depends on the subspace dimensionalities and the level of sharing needs to be specified a priori. To this end, we propose a nonparametric joint factor analysis framework for modeling multiple related data sources. Our model utilizes the hierarchical beta process as a nonparametric prior to automatically infer the number of shared and individual factors. For posterior inference, we provide a Gibbs sampling scheme using auxiliary variables. The effectiveness of the proposed framework is validated through its application on two real world problems -- transfer learning in text and image retrieval.
    @INPROCEEDINGS { gupta_phung_venkatesh_sdm12,
        TITLE = { A {B}ayesian Nonparametric Joint Factor Model for Learning Shared and Individual Subspaces from Multiple Data Sources },
        AUTHOR = { Gupta, S. and Phung, Dinh and Venkatesh, Svetha },
        BOOKTITLE = { Proc. of SIAM Intl. Conf. on Data Mining (SDM) },
        YEAR = { 2012 },
        PAGES = { 200--211 },
        ABSTRACT = { Joint analysis of multiple data sources is becoming increasingly popular in transfer learning, multi-task learning and cross-domain data mining. One promising approach to model the data jointly is through learning the shared and individual factor subspaces. However, performance of this approach depends on the subspace dimensionalities and the level of sharing needs to be specified a priori. To this end, we propose a nonparametric joint factor analysis framework for modeling multiple related data sources. Our model utilizes the hierarchical beta process as a nonparametric prior to automatically infer the number of shared and individual factors. For posterior inference, we provide a Gibbs sampling scheme using auxiliary variables. The effectiveness of the proposed framework is validated through its application on two real world problems -- transfer learning in text and image retrieval. },
    }
C
  • A Sequential Decision Approach to Ordinal Preferences in Recommender Systems
    Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. In Proc. of AAAI Conf. on Artificial Intelligence (AAAI), pages 676-682, 2012. [ | ]
    We propose a novel sequential decision approach to modeling ordinal ratings in collaborative filtering problems. The rating process is assumed to start from the lowest level, evaluates against the latent utility at the corresponding level and moves up until a suitable ordinal level is found. Crucial to this generative process is the underlying utility random variables that govern the generation of ratings and their modelling choices. To this end, we make a novel use of the generalised extreme value distributions, which is found to be particularly suitable for our modeling tasks and at the same time, facilitate our inference and learning procedure. The proposed approach is flexible to incorporate features from both the user and the item. We evaluate the proposed framework on three well-known datasets: MovieLens, Dating Agency and Netflix. In all cases, it is demonstrated that the proposed work is competitive against state-of-the-art collaborative filtering methods.
    @INPROCEEDINGS { truyen_phung_venkatesh_aaai12,
        TITLE = { A Sequential Decision Approach to Ordinal Preferences in Recommender Systems },
        AUTHOR = { Tran, Truyen and Phung, Dinh and Venkatesh, Svetha },
        BOOKTITLE = { Proc. of AAAI Conf. on Artificial Intelligence (AAAI) },
        YEAR = { 2012 },
        PAGES = { 676--682 },
        ABSTRACT = { We propose a novel sequential decision approach to modeling ordinal ratings in collaborative filtering problems. The rating process is assumed to start from the lowest level, evaluates against the latent utility at the corresponding level and moves up until a suitable ordinal level is found. Crucial to this generative process is the underlying utility random variables that govern the generation of ratings and their modelling choices. To this end, we make a novel use of the generalised extreme value distributions, which is found to be particularly suitable for our modeling tasks and at the same time, facilitate our inference and learning procedure. The proposed approach is flexible to incorporate features from both the user and the item. We evaluate the proposed framework on three well-known datasets: MovieLens, Dating Agency and Netflix. In all cases, it is demonstrated that the proposed work is competitive against state-of-the-art collaborative filtering methods. },
        TIMESTAMP = { 2012.04.11 },
    }
C
  • Improved Subspace Clustering via Exploitation of Spatial Constraints
    Pham, S., Budhaditya, Saha, Phung, Dinh and Venkatesh, Svetha. In Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR), pages 550-557, 2012. [ | ]
    We present a novel approach to improving subspace clustering by exploiting the spatial constraints. The new method encourages the sparse solution to be consistent with the spatial geometry of the tracked points, by embedding weights into the sparse formulation. By doing so, we are able to correct sparse representations in a principled manner without introducing much additional computational cost. We discuss alternative ways to treat the missing and corrupted data using the latest theory in robust lasso regression and suggest numerical algorithms so solve the proposed formulation. The experiments on the benchmark Johns Hopkins 155 dataset demonstrate that exploiting spatial constraints significantly improves motion segmentation.
    @INPROCEEDINGS { pham_budhaditya_phung_venkatesh_cvpr12,
        TITLE = { Improved Subspace Clustering via Exploitation of Spatial Constraints },
        AUTHOR = { Pham, S. and Budhaditya, Saha and Phung, Dinh and Venkatesh, Svetha },
        BOOKTITLE = { Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR) },
        YEAR = { 2012 },
        PAGES = { 550--557 },
        ABSTRACT = { We present a novel approach to improving subspace clustering by exploiting the spatial constraints. The new method encourages the sparse solution to be consistent with the spatial geometry of the tracked points, by embedding weights into the sparse formulation. By doing so, we are able to correct sparse representations in a principled manner without introducing much additional computational cost. We discuss alternative ways to treat the missing and corrupted data using the latest theory in robust lasso regression and suggest numerical algorithms so solve the proposed formulation. The experiments on the benchmark Johns Hopkins 155 dataset demonstrate that exploiting spatial constraints significantly improves motion segmentation. },
        OWNER = { thinng },
        TIMESTAMP = { 2012.04.11 },
    }
C
  • Sparse Subspace Representation for Spectral Document Clustering
    Saha, B., Phung, Dinh, Pham, D.S. and Venkatesh, Svetha. In Proc. of IEEE Intl. Conf. on Data Mining (ICDM), pages 1092-1097, 2012. [ | ]
    @INPROCEEDINGS { saha_phung_pham_venkatesh_icdm12,
        TITLE = { Sparse Subspace Representation for Spectral Document Clustering },
        AUTHOR = { Saha, B. and Phung, Dinh and Pham, D.S. and Venkatesh, Svetha },
        BOOKTITLE = { Proc. of IEEE Intl. Conf. on Data Mining (ICDM) },
        YEAR = { 2012 },
        PAGES = { 1092--1097 },
        OWNER = { dinh },
        TIMESTAMP = { 2012.10.31 },
    }
C
  • Detection of Cross-Channel Anomalies
    Pham, S., Budhaditya, Saha, Phung, Dinh and Venkatesh, Svetha. Knowledge and Information Systems (KAIS), 35(1):33-59, 2013. [ | ]
    The data deluge has created a great challenge for data mining applications wherein the rare topics of interest are often buried in the flood of major headlines. We identify and formulate a novel problem: cross-channel anomaly detection from multiple data channels. Cross-channel anomalies are common amongst the individual channel anomalies, and are often portent of significant events. Central to this new problem is a development of theoretical foundation and methodology. Using the spectral approach, we propose a two-stage detection method: anomaly detection at a single-channel level, followed by the detection of cross-channel anomalies from the amalgamation of single channel anomalies. We also derive the extension of the proposed detection method to an online settings, which automatically adapts to changes in the data over time at low computational complexity using incremental algorithms. Our mathematical analysis shows that our method is likely to reduce the false alarm rate by establishing theoretical results on the reduction of an impurity index. We demonstrate our method in two applications: document understanding with multiple text corpora, and detection of repeated anomalies in large-scale video surveillance. The experimental results consistently demonstrate the superior performance of our method compared with related state-of-art methods, including the one-class SVM and principal component pursuit. In addition, our framework can be deployed in a decentralized manner, lending itself for large scale data stream analysis.
    @ARTICLE { pham_budhaditya_phung_venkatesh_kais13,
        TITLE = { Detection of Cross-Channel Anomalies },
        AUTHOR = { Pham, S. and Budhaditya, Saha and Phung, Dinh and Venkatesh, Svetha },
        JOURNAL = { Knowledge and Information Systems (KAIS) },
        YEAR = { 2013 },
        NUMBER = { 1 },
        PAGES = { 33--59 },
        VOLUME = { 35 },
        ABSTRACT = { The data deluge has created a great challenge for data mining applications wherein the rare topics of interest are often buried in the flood of major headlines. We identify and formulate a novel problem: cross-channel anomaly detection from multiple data channels. Cross-channel anomalies are common amongst the individual channel anomalies, and are often portent of significant events. Central to this new problem is a development of theoretical foundation and methodology. Using the spectral approach, we propose a two-stage detection method: anomaly detection at a single-channel level, followed by the detection of cross-channel anomalies from the amalgamation of single channel anomalies. We also derive the extension of the proposed detection method to an online settings, which automatically adapts to changes in the data over time at low computational complexity using incremental algorithms. Our mathematical analysis shows that our method is likely to reduce the false alarm rate by establishing theoretical results on the reduction of an impurity index. We demonstrate our method in two applications: document understanding with multiple text corpora, and detection of repeated anomalies in large-scale video surveillance. The experimental results consistently demonstrate the superior performance of our method compared with related state-of-art methods, including the one-class SVM and principal component pursuit. In addition, our framework can be deployed in a decentralized manner, lending itself for large scale data stream analysis. },
    }
J
  • Detection of Cross-Channel Anomalies From Multiple Data Channels
    Pham, S., Budhaditya, Saha, Phung, Dinh and Venkatesh, Svetha. In Proc. of IEEE Intl. Conf. on Data Mining (ICDM), Vancouver, Canada, December 2011. [ | ]
    We identify and formulate a novel problem: cross-channel anomaly detection from multiple data channels. Cross channel anomalies are common amongst the individual channel anomalies, and are often portent of significant events. Using spectral approaches, we propose a two-stage detection method: anomaly detection at a single-channel level, followed by the detection of cross-channel anomalies from the amalgamation of single channel anomalies. Our mathematical analysis shows that our method is likely to reduce the false alarm rate. We demonstrate our method in two applications: document understanding with multiple text corpora, and detection of repeated anomalies in video surveillance. The experimental results consistently demonstrate the superior performance of our method compared with related state-of-art methods, including the one-class SVM and principal component pursuit. In addition, our framework can be deployed in a decentralized manner, lending itself for large scale data stream analysis.
    @INPROCEEDINGS { pham_budhaditya_phung_venkatesh_icdm11,
        TITLE = { Detection of Cross-Channel Anomalies From Multiple Data Channels },
        AUTHOR = { Pham, S. and Budhaditya, Saha and Phung, Dinh and Venkatesh, Svetha },
        BOOKTITLE = { Proc. of IEEE Intl. Conf. on Data Mining (ICDM) },
        YEAR = { 2011 },
        ADDRESS = { Vancouver, Canada },
        MONTH = { December },
        ABSTRACT = { We identify and formulate a novel problem: cross-channel anomaly detection from multiple data channels. Cross channel anomalies are common amongst the individual channel anomalies, and are often portent of significant events. Using spectral approaches, we propose a two-stage detection method: anomaly detection at a single-channel level, followed by the detection of cross-channel anomalies from the amalgamation of single channel anomalies. Our mathematical analysis shows that our method is likely to reduce the false alarm rate. We demonstrate our method in two applications: document understanding with multiple text corpora, and detection of repeated anomalies in video surveillance. The experimental results consistently demonstrate the superior performance of our method compared with related state-of-art methods, including the one-class SVM and principal component pursuit. In addition, our framework can be deployed in a decentralized manner, lending itself for large scale data stream analysis. },
        COMMENT = { coauthor },
        OWNER = { thinng },
        TIMESTAMP = { 2012.04.11 },
    }
C
  • Probabilistic Models over Ordered Partitions with Applications in Document Ranking and Collaborative Filtering
    Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. In Procs. of SIAM Intl. Conf. on Data Mining (SDM), Arizona, USA, April 2011. [ | | pdf]
    Ranking is an important task for handling a large amount of content. Ideally, training data for supervised ranking would include a complete rank of documents (or other objects such as images or videos) for a particular query. However, this is only possible for small sets of documents. In practice, one often resorts to document rating, in that a subset of documents is assigned with a small number indicating the degree of relevance. This poses a general problem of modelling and learning rank data with ties. In this paper, we propose a probabilistic generative model, that modelsthe process as permutations over partitions. This results in super-exponential combinatorial state space with unknown numbers of partitions and unknown ordering among them. We approach the problem from the discrete choice theory, where subsets are chosen in a stagewise manner, reducing the state space per each stage significantly. Further, we show that with suitable parameterisation, we can still learn the models in linear time. We evaluate the proposed models on two application areas: (i) document ranking with the data from the recently held Yahoo! challenge, and (ii) collaborative filtering with movie data. The results demonstrate that the models are competitive against well-known rivals.
    @INPROCEEDINGS { truyen_phung_venkatesh_sdm11,
        TITLE = { Probabilistic Models over Ordered Partitions with Applications in Document Ranking and Collaborative Filtering },
        AUTHOR = { Tran, Truyen and Phung, Dinh and Venkatesh, Svetha },
        BOOKTITLE = { Procs. of SIAM Intl. Conf. on Data Mining (SDM) },
        YEAR = { 2011 },
        ADDRESS = { Arizona, USA },
        MONTH = { April },
        ABSTRACT = { Ranking is an important task for handling a large amount of content. Ideally, training data for supervised ranking would include a complete rank of documents (or other objects such as images or videos) for a particular query. However, this is only possible for small sets of documents. In practice, one often resorts to document rating, in that a subset of documents is assigned with a small number indicating the degree of relevance. This poses a general problem of modelling and learning rank data with ties. In this paper, we propose a probabilistic generative model, that modelsthe process as permutations over partitions. This results in super-exponential combinatorial state space with unknown numbers of partitions and unknown ordering among them. We approach the problem from the discrete choice theory, where subsets are chosen in a stagewise manner, reducing the state space per each stage significantly. Further, we show that with suitable parameterisation, we can still learn the models in linear time. We evaluate the proposed models on two application areas: (i) document ranking with the data from the recently held Yahoo! challenge, and (ii) collaborative filtering with movie data. The results demonstrate that the models are competitive against well-known rivals. },
        COMMENT = { coauthor },
        FILE = { :papers\\phung\\truyen_phung_venkatesh_sdm11.pdf:PDF },
        OWNER = { 184698H },
        TIMESTAMP = { 2011.02.07 },
        URL = { http://www.computing.edu.au/~phung/wiki_new/uploads/Main/Truyen_etal_sdm11.pdf },
    }
C
  • Nonnegative Shared Subspace Learning and Its Application to Social Media Retrieval
    Gupta, Sunil, Phung, Dinh, Adams, Brett, Tran, Truyen and Venkatesh, Svetha. In Proc. of ACM Intl. Conf. on Knowledge Discovery and Data Mining (SIGKDD), Washington DC, USA, July 2010. [ | | pdf]
    Although tagging has become increasingly popular in online image and video sharing systems, tags are known to be noisy, ambiguous, incomplete and subjective. These factors can seriously affect the precision of a social tag-based web retrieval system. Therefore improving the precision performance of these social tag-based web retrieval systems has become an increasingly important research topic. To this end, we propose a shared subspace learning framework to leverage a secondary source to improve retrieval performance from a primary dataset. This is achieved by learning a shared subspace between the two sources under a joint Nonnegative Matrix Factorization in which the level of subspace sharing can be explicitly controlled. We derive an efficient algorithm for learning the factorization, analyze its complexity, and provide proof of convergence. We validate the framework on image and video retrieval tasks in which tags from the LabelMe dataset are used to improve image retrieval performance from a Flickr dataset and video retrieval performance from a YouTube dataset. This has implications for how to exploit and transfer knowledge from readily available auxiliary tagging resources to improve another social web retrieval system. Our shared subspace learning framework is applicable to a range of problems where one needs to exploit the strengths existing among multiple and heterogeneous datasets.
    @INPROCEEDINGS { gupta_phung_adams_truyen_venkatesh_sigkdd10,
        TITLE = { Nonnegative Shared Subspace Learning and Its Application to Social Media Retrieval },
        AUTHOR = { Gupta, Sunil and Phung, Dinh and Adams, Brett and Tran, Truyen and Venkatesh, Svetha },
        BOOKTITLE = { Proc. of ACM Intl. Conf. on Knowledge Discovery and Data Mining (SIGKDD) },
        YEAR = { 2010 },
        ADDRESS = { Washington DC, USA },
        MONTH = { July },
        ABSTRACT = { Although tagging has become increasingly popular in online image and video sharing systems, tags are known to be noisy, ambiguous, incomplete and subjective. These factors can seriously affect the precision of a social tag-based web retrieval system. Therefore improving the precision performance of these social tag-based web retrieval systems has become an increasingly important research topic. To this end, we propose a shared subspace learning framework to leverage a secondary source to improve retrieval performance from a primary dataset. This is achieved by learning a shared subspace between the two sources under a joint Nonnegative Matrix Factorization in which the level of subspace sharing can be explicitly controlled. We derive an efficient algorithm for learning the factorization, analyze its complexity, and provide proof of convergence. We validate the framework on image and video retrieval tasks in which tags from the LabelMe dataset are used to improve image retrieval performance from a Flickr dataset and video retrieval performance from a YouTube dataset. This has implications for how to exploit and transfer knowledge from readily available auxiliary tagging resources to improve another social web retrieval system. Our shared subspace learning framework is applicable to a range of problems where one needs to exploit the strengths existing among multiple and heterogeneous datasets. },
        COMMENT = { coauthor },
        FILE = { :papers\\phung\\gupta_phung_adams_truyen_venkatesh_sigkdd10.pdf:PDF },
        OWNER = { Dinh Phung },
        TIMESTAMP = { 2010.06.29 },
        URL = { http://www.computing.edu.au/~phung/wiki_new/uploads/Main/Gupta_etal_sigkdd10.pdf },
    }
C
  • Efficient duration and hierarchical modeling for human activity recognition
    Duong, Thi, Phung, Dinh, Bui, Hung and Venkatesh, Svetha. Artificial Intelligence (AIJ), 173(7-8):830-856, 2009. [ | | pdf | code]
    A challenge in building pervasive and smart spaces is to learn and recognize human activities of daily living (ADLs). In this paper, we address this problem and argue that in dealing with ADLs, it is beneficial to exploit both their typical duration patterns and inherent hierarchical structures. We exploit efficient duration modeling using the novel Coxian distribution to form the Coxian hidden semi-Markov model (CxHSMM) and apply it to the problem of learning and recognizing ADLs with complex temporal dependencies. The Coxian duration model has several advantages over existing duration parameterization using multinomial or exponential family distributions, including its denseness in the space of non-negative distributions, low number of parameters, computational efficiency and the existence of closed-form estimation solutions. Further we combine both hierarchical and duration extensions of the hidden Markov model (HMM) to form the novel switching hidden semi-Markov model (SHSMM), and empirically compare its performance with existing models. The model can learn what an occupant normally does during the day from unsegmented training data and then perform online activity classification, segmentation and abnormality detection. Experimental results show that Coxian modeling outperform a range of baseline models for the task of activity segmentation. We also achieve a recognition accuracy competitive to the current state-of-the-art multinomial duration model, whilst gain a significant reduction in computation. Furthermore, cross-validation model selection on the number of phases K in the Coxian indicates that only a small K is required to achieve the optimal performance. Finally, our models are further tested in a more challenging setting in which the tracking is often lost and the set of activities considerably overlap. With a small amount of labels supplied during training in a partially supervised learning mode, our models are again able to deliver reliable performance, again with a small number of phases, making our proposed framework an attractive choice for activity modeling.
    @ARTICLE { duong_phung_bui_venkatesh_aij09,
        AUTHOR = { Duong, Thi and Phung, Dinh and Bui, Hung and Venkatesh, Svetha },
        TITLE = { Efficient duration and hierarchical modeling for human activity recognition },
        JOURNAL = { Artificial Intelligence (AIJ) },
        YEAR = { 2009 },
        VOLUME = { 173 },
        NUMBER = { 7-8 },
        PAGES = { 830--856 },
        ABSTRACT = { A challenge in building pervasive and smart spaces is to learn and recognize human activities of daily living (ADLs). In this paper, we address this problem and argue that in dealing with ADLs, it is beneficial to exploit both their typical duration patterns and inherent hierarchical structures. We exploit efficient duration modeling using the novel Coxian distribution to form the Coxian hidden semi-Markov model (CxHSMM) and apply it to the problem of learning and recognizing ADLs with complex temporal dependencies. The Coxian duration model has several advantages over existing duration parameterization using multinomial or exponential family distributions, including its denseness in the space of non-negative distributions, low number of parameters, computational efficiency and the existence of closed-form estimation solutions. Further we combine both hierarchical and duration extensions of the hidden Markov model (HMM) to form the novel switching hidden semi-Markov model (SHSMM), and empirically compare its performance with existing models. The model can learn what an occupant normally does during the day from unsegmented training data and then perform online activity classification, segmentation and abnormality detection. Experimental results show that Coxian modeling outperform a range of baseline models for the task of activity segmentation. We also achieve a recognition accuracy competitive to the current state-of-the-art multinomial duration model, whilst gain a significant reduction in computation. Furthermore, cross-validation model selection on the number of phases K in the Coxian indicates that only a small K is required to achieve the optimal performance. Finally, our models are further tested in a more challenging setting in which the tracking is often lost and the set of activities considerably overlap. With a small amount of labels supplied during training in a partially supervised learning mode, our models are again able to deliver reliable performance, again with a small number of phases, making our proposed framework an attractive choice for activity modeling. },
        CODE = { https://github.com/DASCIMAL/CxHSMM },
        COMMENT = { coauthor },
        DOI = { http://dx.doi.org/10.1016/j.artint.2008.12.005 },
        FILE = { :duong_phung_bui_venkatesh_aij09 - Efficient Duration and Hierarchical Modeling for Human Activity Recognition.pdf:PDF },
        KEYWORDS = { activity, recognition, duration modeling, Coxian, Hidden semi-Markov model, HSMM , smart surveillance },
        OWNER = { 184698H },
        PUBLISHER = { Elsevier },
        TIMESTAMP = { 2010.08.11 },
        URL = { http://www.sciencedirect.com/science/article/pii/S0004370208002142 },
    }
J
  • MCMC for Hierarchical Semi-Markov Conditional Random Fields
    Tran, Truyen, Phung, Dinh, Bui, Hung and Venkatesh, Svetha. In Proc. of Workshop on Deep Learning for Speech Recognition and Related Applications, in conjunction with the Neural Information Processing Systems Conference (NIPS), Whistler, BC, Canada, December 2009. [ | ]
    Deep architecture such as hierarchical semi-Markov models is an important class of models for nested sequential data. Current exact inference schemes either cost cubic time in sequence length, or exponential time in model depth. These costs are prohibitive for large-scale problems with arbitrary length and depth. In this contribution, we propose a new approximation technique that may have the potential to achieve sub-cubic time complexity in length and linear time depth, at the cost of some loss of quality. The idea is based on two well-known methods: Gibbs sampling and Rao-Blackwellisation. We provide some simulation-based evaluation of the quality of the RGBS with respect to run time and sequence length.
    @INPROCEEDINGS { truyen_phung_bui_venkatesh_nips09,
        TITLE = { {MCMC} for Hierarchical Semi-Markov Conditional Random Fields },
        AUTHOR = { Tran, Truyen and Phung, Dinh and Bui, Hung and Venkatesh, Svetha },
        BOOKTITLE = { Proc. of Workshop on Deep Learning for Speech Recognition and Related Applications, in conjunction with the Neural Information Processing Systems Conference (NIPS) },
        YEAR = { 2009 },
        ADDRESS = { Whistler, BC, Canada },
        MONTH = { December },
        ABSTRACT = { Deep architecture such as hierarchical semi-Markov models is an important class of models for nested sequential data. Current exact inference schemes either cost cubic time in sequence length, or exponential time in model depth. These costs are prohibitive for large-scale problems with arbitrary length and depth. In this contribution, we propose a new approximation technique that may have the potential to achieve sub-cubic time complexity in length and linear time depth, at the cost of some loss of quality. The idea is based on two well-known methods: Gibbs sampling and Rao-Blackwellisation. We provide some simulation-based evaluation of the quality of the RGBS with respect to run time and sequence length. },
        COMMENT = { coauthor },
        OWNER = { Dinh Phung },
        TIMESTAMP = { 2010.06.29 },
    }
C
  • Ordinal Boltzmann Machines for Collaborative Filtering
    Truyen Tran, Dinh Phung and Svetha Venkatesh. In Proc. of the 25th Conference on Uncertainty in Artificial Intelligence (UAI), pages 548-556, Arlington, Virginia, United States, June 2009. (Runner-up Best Paper Award). [ | | pdf]
    Collaborative filtering is an effective recommendation technique wherein the preference of an individual can potentially be predicted based on preferences of other members. Early algorithms often relied on the strong locality in the preference data, that is, it is enough to predict preference of a user on a particular item based on a small subset of other users with similar tastes or of other items with similar properties. More recently, dimensionality reduction techniques have proved to be equally competitive, and these are based on the co-occurrence patterns rather than locality. This paper explores and extends a probabilistic model known as Boltzmann Machine for collaborative filtering tasks. It seamlessly integrates both the similarity and co-occurrence in a principled manner. In particular, we study parameterisation options to deal with the ordinal nature of the preferences, and propose a joint modelling of both the user-based and itembased processes. Experiments on moderate and large-scale movie recommendation show that our framework rivals existing well-known methods.
    @INPROCEEDINGS { truyen_phung_venkatesh_uai09,
        AUTHOR = { Truyen Tran and Dinh Phung and Svetha Venkatesh },
        TITLE = { Ordinal Boltzmann Machines for Collaborative Filtering },
        BOOKTITLE = { Proc. of the 25th Conference on Uncertainty in Artificial Intelligence (UAI) },
        YEAR = { 2009 },
        SERIES = { UAI '09 },
        PAGES = { 548--556 },
        ADDRESS = { Arlington, Virginia, United States },
        MONTH = { June },
        PUBLISHER = { AUAI Press },
        NOTE = { Runner-up Best Paper Award },
        ABSTRACT = { Collaborative filtering is an effective recommendation technique wherein the preference of an individual can potentially be predicted based on preferences of other members. Early algorithms often relied on the strong locality in the preference data, that is, it is enough to predict preference of a user on a particular item based on a small subset of other users with similar tastes or of other items with similar properties. More recently, dimensionality reduction techniques have proved to be equally competitive, and these are based on the co-occurrence patterns rather than locality. This paper explores and extends a probabilistic model known as Boltzmann Machine for collaborative filtering tasks. It seamlessly integrates both the similarity and co-occurrence in a principled manner. In particular, we study parameterisation options to deal with the ordinal nature of the preferences, and propose a joint modelling of both the user-based and itembased processes. Experiments on moderate and large-scale movie recommendation show that our framework rivals existing well-known methods. },
        ACMID = { 1795178 },
        COMMENT = { coauthor },
        FILE = { :truyen_phung_venkatesh_uai09 - Ordinal Boltzmann Machines for Collaborative Filtering.pdf:PDF },
        ISBN = { 978-0-9749039-5-8 },
        LOCATION = { Montreal, Quebec, Canada },
        NUMPAGES = { 9 },
        OWNER = { Dinh Phung },
        TIMESTAMP = { 2009.09.22 },
        URL = { http://dl.acm.org/citation.cfm?id=1795114.1795178 },
    }
C
  • The Hidden Permutation Model and Location-Based Activity Recognition
    Bui, Hung, Phung, Dinh, Venkatesh, Svetha and Phan, Hai. In Proceedings of the National Conference on Artificial Intelligence (AAAI), pages 1345-1350, Chicago, USA, July 2008. [ | | pdf]
    Permutation modeling is challenging because of the combinatorial nature of the problem. However, such modeling is often required in many real-world applications, including activity recognition where subactivities are often permuted and partially ordered. This paper introduces a novel Hidden Permutation Model (HPM) that can learn the partial ordering constraints in permuted state sequences. The HPMis parameterized as an exponential family distribution and is flexible so that it can encode constraints via different feature functions. A chain-flipping Metropolis-Hastings Markov chain Monte Carlo (MCMC) is employed for inference to overcome the O(n!) complexity. Gradient-based maximum likelihood parameter learning is presented for two cases when the permutation is known and when it is hidden. The HPM is evaluated using both simulated and real data from a location-based activity recognition domain. Experimental results indicate that the HPM performs far better than other baseline models, including the naive Bayes classifier, the HMM classifier, and Kirshners multinomial permutation model. Our presented HPM is generic and can potentially be utilized in any problem where the modeling of permuted states from noisy data is needed.
    @INPROCEEDINGS { bui_phung_venkatesh_phan_aaai08,
        TITLE = { The Hidden Permutation Model and Location-Based Activity Recognition },
        AUTHOR = { Bui, Hung and Phung, Dinh and Venkatesh, Svetha and Phan, Hai },
        BOOKTITLE = { Proceedings of the National Conference on Artificial Intelligence (AAAI) },
        YEAR = { 2008 },
        ADDRESS = { Chicago, USA },
        MONTH = { July },
        PAGES = { 1345--1350 },
        VOLUME = { 8 },
        ABSTRACT = { Permutation modeling is challenging because of the combinatorial nature of the problem. However, such modeling is often required in many real-world applications, including activity recognition where subactivities are often permuted and partially ordered. This paper introduces a novel Hidden Permutation Model (HPM) that can learn the partial ordering constraints in permuted state sequences. The HPMis parameterized as an exponential family distribution and is flexible so that it can encode constraints via different feature functions. A chain-flipping Metropolis-Hastings Markov chain Monte Carlo (MCMC) is employed for inference to overcome the O(n!) complexity. Gradient-based maximum likelihood parameter learning is presented for two cases when the permutation is known and when it is hidden. The HPM is evaluated using both simulated and real data from a location-based activity recognition domain. Experimental results indicate that the HPM performs far better than other baseline models, including the naive Bayes classifier, the HMM classifier, and Kirshners multinomial permutation model. Our presented HPM is generic and can potentially be utilized in any problem where the modeling of permuted states from noisy data is needed. },
        FILE = { :papers\\phung\\bui_phung_venkatesh_phan_aaai08.pdf:PDF },
        OWNER = { 184698H },
        TIMESTAMP = { 2010.08.11 },
        URL = { http://www.aaai.org/Papers/AAAI/2008/AAAI08-213.pdf },
    }
C
  • Hierarchical Semi-Markov Conditional Random Fields for Recursive Sequential Data
    Tran, Truyen, Phung, Dinh, Bui, Hung and Venkatesh, Svetha. Advances in Neural Information Processing (NIPS), December 2008. [ | | ]
    Inspired by the hierarchical hidden Markov models (HHMM), we present the hierarchical conditional random field (HCRF), a generalisation of embedded undirected Markov chains to model complex hierarchical, nested Markov processes. It is parameterised in a discriminative framework and has polynomial time algorithms for learning and inference. Importantly, we consider partially-supervised learning and propose algorithms for generalised partially-supervised learning and constrained inference. We demonstrate the HCRF in two applications: (i) recognising human activities of daily living (ADLs) from indoor surveillance cameras, and (ii) noun-phrase chunking. We show that the HCRF is capable of learning rich hierarchical models with reasonable accuracy in both fully and partially observed data cases.
    @ARTICLE { truyen_phung_bui_venkatesh_nips08,
        TITLE = { Hierarchical Semi-Markov Conditional Random Fields for Recursive Sequential Data },
        AUTHOR = { Tran, Truyen and Phung, Dinh and Bui, Hung and Venkatesh, Svetha },
        JOURNAL = { Advances in Neural Information Processing (NIPS) },
        YEAR = { 2008 },
        MONTH = { December },
        ABSTRACT = { Inspired by the hierarchical hidden Markov models (HHMM), we present the hierarchical conditional random field (HCRF), a generalisation of embedded undirected Markov chains to model complex hierarchical, nested Markov processes. It is parameterised in a discriminative framework and has polynomial time algorithms for learning and inference. Importantly, we consider partially-supervised learning and propose algorithms for generalised partially-supervised learning and constrained inference. We demonstrate the HCRF in two applications: (i) recognising human activities of daily living (ADLs) from indoor surveillance cameras, and (ii) noun-phrase chunking. We show that the HCRF is capable of learning rich hierarchical models with reasonable accuracy in both fully and partially observed data cases. },
        ADDRESS = { Vancouver, Canada },
        OWNER = { 184698H },
        TIMESTAMP = { 2010.08.11 },
        URL = { 2008/conferences/truyen_phung_bui_venkatesh_nips08.pdf },
    }
J
  • AdaBoost.MRF: Boosted Markov Random Forests and Application to Multilevel Activity Recognition
    Tran, Truyen, Phung, Dinh, Bui, Hung and Venkatesh, Svetha. In Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR), pages 1686-1693, New York, USA, June 2006. [ | ]
    Activity recognition is an important issue in building intelligent monitoring systems. We address the recognition of multilevel activities in this paper via a conditional Markov random field (MRF), known as the dynamic conditional random field (DCRF). Parameter estimation in general MRFs using maximum likelihood is known to be computationally challenging (except for extreme cases), and thus we propose an efficient boosting-based algorithm AdaBoost.MRF for this task. Distinct from most existing work, our algorithm can handle hidden variables (missing labels) and is particularly attractive for smarthouse domains where reliable labels are often sparsely observed. Furthermore, our method works exclusively on trees and thus is guaranteed to converge. We apply the AdaBoost.MRF algorithmto a home video surveillance application and demonstrate its efficacy.
    @INPROCEEDINGS { truyen_phung_bui_venkatesh_cvpr06,
        TITLE = { {AdaBoost.MRF}: Boosted {M}arkov Random Forests and Application to Multilevel Activity Recognition },
        AUTHOR = { Tran, Truyen and Phung, Dinh and Bui, Hung and Venkatesh, Svetha },
        BOOKTITLE = { Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR) },
        YEAR = { 2006 },
        ADDRESS = { New York, USA },
        MONTH = { June },
        PAGES = { 1686-1693 },
        ABSTRACT = { Activity recognition is an important issue in building intelligent monitoring systems. We address the recognition of multilevel activities in this paper via a conditional Markov random field (MRF), known as the dynamic conditional random field (DCRF). Parameter estimation in general MRFs using maximum likelihood is known to be computationally challenging (except for extreme cases), and thus we propose an efficient boosting-based algorithm AdaBoost.MRF for this task. Distinct from most existing work, our algorithm can handle hidden variables (missing labels) and is particularly attractive for smarthouse domains where reliable labels are often sparsely observed. Furthermore, our method works exclusively on trees and thus is guaranteed to converge. We apply the AdaBoost.MRF algorithmto a home video surveillance application and demonstrate its efficacy. },
        COMMENT = { coauthor },
        OWNER = { 184698H },
        TIMESTAMP = { 2010.08.11 },
    }
C
  • Activity Recognition and Abnormality Detection with the Switching Hidden Semi-Markov Model
    Duong, Thi, Bui, Hung, Phung, Dinh and Venkatesh, Svetha. In Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR), pages 838-845, San Diego, 20-26 June 2005. [ | ]
    This paper addresses the problem of learning and recognizing human activities of daily living (ADL), which is an important research issue in building a pervasive and smart environment. In dealing with ADL, we argue that it is beneficial to exploit both the inherent hierarchical organization of the activities and their typical duration. To this end, we introduce the Switching Hidden Semi-Markov Model (S-HSMM), a two-layered extension of the hidden semi-Markov model (HSMM) for the modeling task. Activities are modeled in the S-HSMM in two ways: the bottom layer represents atomic activities and their duration using HSMMs; the top layer represents a sequence of high-level activities where each high-level activity is made of a sequence of atomic activities. We consider two methods for modeling duration: the classic explicit duration model using multinomial distribution, and the novel use of the discrete Coxian distribution. In addition, we propose an effective scheme to detect abnormality without the need for training on abnormal data. Experimental results show that the S-HSMMperforms better than existing models including the flat HSMM and the hierarchical hidden Markov model in both classification and abnormality detection tasks, alleviating the need for presegmented training data. Furthermore, our discrete Coxian duration model yields better computation time and generalization error than the classic explicit duration model.
    @INPROCEEDINGS { duong_bui_phung_venkatesh_cvpr05,
        TITLE = { Activity Recognition and Abnormality Detection with the {S}witching {H}idden {S}emi-{M}arkov {M}odel },
        AUTHOR = { Duong, Thi and Bui, Hung and Phung, Dinh and Venkatesh, Svetha },
        BOOKTITLE = { Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR) },
        YEAR = { 2005 },
        ADDRESS = { San Diego },
        MONTH = { 20-26 June },
        PAGES = { 838--845 },
        PUBLISHER = { IEEE Computer Society },
        VOLUME = { 1 },
        ABSTRACT = { This paper addresses the problem of learning and recognizing human activities of daily living (ADL), which is an important research issue in building a pervasive and smart environment. In dealing with ADL, we argue that it is beneficial to exploit both the inherent hierarchical organization of the activities and their typical duration. To this end, we introduce the Switching Hidden Semi-Markov Model (S-HSMM), a two-layered extension of the hidden semi-Markov model (HSMM) for the modeling task. Activities are modeled in the S-HSMM in two ways: the bottom layer represents atomic activities and their duration using HSMMs; the top layer represents a sequence of high-level activities where each high-level activity is made of a sequence of atomic activities. We consider two methods for modeling duration: the classic explicit duration model using multinomial distribution, and the novel use of the discrete Coxian distribution. In addition, we propose an effective scheme to detect abnormality without the need for training on abnormal data. Experimental results show that the S-HSMMperforms better than existing models including the flat HSMM and the hierarchical hidden Markov model in both classification and abnormality detection tasks, alleviating the need for presegmented training data. Furthermore, our discrete Coxian duration model yields better computation time and generalization error than the classic explicit duration model. },
        KEYWORDS = { Activity Recognition, Abnormality detection, semi-Markov, hierarchical HSMM },
        OWNER = { 184698H },
        TIMESTAMP = { 2010.08.11 },
    }
C
  • Learning and Detecting Activities from Movement Trajectories Using the Hierarchical Hidden Markov Model
    Nguyen, N., Phung, Dinh, Bui, Hung and Venkatesh, Svetha. In Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR), pages 955-960, San Diego, 2005. [ | ]
    Directly modeling the inherent hierarchy and shared structures of human behaviors, we present an application of the hierarchical hidden Markov model (HHMM) for the problem of activity recognition. We argue that to robustly model and recognize complex human activities, it is crucial to exploit both the natural hierarchical decomposition and shared semantics embedded in the movement trajectories. To this end, we propose the use of the HHMM, a rich stochastic model that has been recently extended to handle shared structures, for representing and recognizing a set of complex indoor activities. Furthermore, in the need of real-time recognition, we propose a Rao-Blackwellised particle filter (RBPF) that efficiently computes the filtering distribution at a constant time complexity for each new observation arrival. The main contributions of this paper lie in the application of the sharedstructure HHMM, the estimation of the model's parameters at all levels simultaneously, and a construction of an RBPF approximate inference scheme. The experimental results in a real-world environment have confirmed our belief that directly modeling shared structures not only reduces computational cost, but also improves recognition accuracy when compared with the tree HHMM and the flat HMM.
    @INPROCEEDINGS { nguyen_phung_bui_venkatesh_cvpr05,
        TITLE = { Learning and Detecting Activities from Movement Trajectories Using the Hierarchical Hidden Markov Model },
        AUTHOR = { Nguyen, N. and Phung, Dinh and Bui, Hung and Venkatesh, Svetha },
        BOOKTITLE = { Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR) },
        YEAR = { 2005 },
        ADDRESS = { San Diego },
        PAGES = { 955--960 },
        PUBLISHER = { IEEE Computer Soceity },
        VOLUME = { 1 },
        ABSTRACT = { Directly modeling the inherent hierarchy and shared structures of human behaviors, we present an application of the hierarchical hidden Markov model (HHMM) for the problem of activity recognition. We argue that to robustly model and recognize complex human activities, it is crucial to exploit both the natural hierarchical decomposition and shared semantics embedded in the movement trajectories. To this end, we propose the use of the HHMM, a rich stochastic model that has been recently extended to handle shared structures, for representing and recognizing a set of complex indoor activities. Furthermore, in the need of real-time recognition, we propose a Rao-Blackwellised particle filter (RBPF) that efficiently computes the filtering distribution at a constant time complexity for each new observation arrival. The main contributions of this paper lie in the application of the sharedstructure HHMM, the estimation of the model's parameters at all levels simultaneously, and a construction of an RBPF approximate inference scheme. The experimental results in a real-world environment have confirmed our belief that directly modeling shared structures not only reduces computational cost, but also improves recognition accuracy when compared with the tree HHMM and the flat HMM. },
        OWNER = { 184698H },
        TIMESTAMP = { 2010.08.11 },
    }
C
  • Topic Transition Detection Using Hierarchical Hidden Markov and Semi-Markov Models
    Phung, Dinh, Duong, Thi, Bui, Hung and Venkatesh, Svetha. In Proc. of ACM Intl. Conf. on Multimedia (ACM-MM), Singapore, 6--11 Nov. 2005. [ | ]
    In this paper we introduce a probabilistic framework to exploit hierarchy, structure sharing and duration information for topic transition detection in videos. Our probabilistic detection framework is a combination of a shot classification step and a detection phase using hierarchical probabilistic models. We consider two models in this paper: the extended Hierarchical Hidden Markov Model (HHMM) and the Coxian Switching Hidden semi-Markov Model (S-HSMM) because they allow the natural decomposition of semantics in videos, including shared structures, to be modeled directly, and thus enable efficient inference and reduce the sample complexity in learning. Additionally, the S-HSMM allows the duration information to be incorporated, consequently the modeling of long-term dependencies in videos is enriched through both hierarchical and duration modeling. Furthermore, the use of Coxian distribution in the S-HSMM makes it tractable to deal with long sequences in video. Our experimentation of the proposed framework on twelve educational and training videos shows that both models outperform the baseline cases (flat HMM and HSMM) and performances reported in earlier work in topic detection. The superior performance of the S-HSMM over the HHMM verifies our belief that the duration information is an important factor in video content modeling.
    @INPROCEEDINGS { phung_duong_bui_venkatesh_acmmm05,
        TITLE = { Topic Transition Detection Using Hierarchical Hidden Markov and Semi-Markov Models },
        AUTHOR = { Phung, Dinh and Duong, Thi and Bui, Hung and Venkatesh, Svetha },
        BOOKTITLE = { Proc. of ACM Intl. Conf. on Multimedia (ACM-MM) },
        YEAR = { 2005 },
        ADDRESS = { Singapore },
        MONTH = { 6--11 Nov. },
        ABSTRACT = { In this paper we introduce a probabilistic framework to exploit hierarchy, structure sharing and duration information for topic transition detection in videos. Our probabilistic detection framework is a combination of a shot classification step and a detection phase using hierarchical probabilistic models. We consider two models in this paper: the extended Hierarchical Hidden Markov Model (HHMM) and the Coxian Switching Hidden semi-Markov Model (S-HSMM) because they allow the natural decomposition of semantics in videos, including shared structures, to be modeled directly, and thus enable efficient inference and reduce the sample complexity in learning. Additionally, the S-HSMM allows the duration information to be incorporated, consequently the modeling of long-term dependencies in videos is enriched through both hierarchical and duration modeling. Furthermore, the use of Coxian distribution in the S-HSMM makes it tractable to deal with long sequences in video. Our experimentation of the proposed framework on twelve educational and training videos shows that both models outperform the baseline cases (flat HMM and HSMM) and performances reported in earlier work in topic detection. The superior performance of the S-HSMM over the HHMM verifies our belief that the duration information is an important factor in video content modeling. },
        OWNER = { 184698H },
        TIMESTAMP = { 2010.08.11 },
    }
C
  • Hierarchical Hidden Markov Models with General State Hierarchy
    Bui, Hung, Phung, Dinh and Venkatesh, Svetha. In Proceedings of the National Conference on Artificial Intelligence (AAAI), pages 324-329, San Jose, California, USA, 2004. [ | | pdf]
    The hierarchical hidden Markov model (HHMM) is an extension of the hidden Markov model to include a hierarchy of the hidden states. This form of hierarchical modeling has been found useful in applications such as handwritten recognition, behavior recognition, video indexing, and text retrieval. Nevertheless, the state hierarchy in the original HHMM is restricted to a tree structure. This prohibits two different states from having the same child, and thus does not allow for sharing of common substructures in the model. In this paper, we present a general HHMM in which the state hierarchy can be a lattice allowing arbitrary sharing of substructures. Furthermore, we provide a method for numerical scaling to avoid underflow, an important issue in dealing with long observation sequences. We demonstrate the working of our method in a simulated environment where a hierarchical behavioral model is automatically learned and later used for recognition.
    @INPROCEEDINGS { bui_phung_venkatesh_aaai04,
        TITLE = { Hierarchical Hidden Markov Models with General State Hierarchy },
        AUTHOR = { Bui, Hung and Phung, Dinh and Venkatesh, Svetha },
        BOOKTITLE = { Proceedings of the National Conference on Artificial Intelligence (AAAI) },
        YEAR = { 2004 },
        ADDRESS = { San Jose, California, USA },
        EDITOR = { McGuinness, Deborah L. and Ferguson, George },
        PAGES = { 324--329 },
        PUBLISHER = { MIT Press },
        ABSTRACT = { The hierarchical hidden Markov model (HHMM) is an extension of the hidden Markov model to include a hierarchy of the hidden states. This form of hierarchical modeling has been found useful in applications such as handwritten recognition, behavior recognition, video indexing, and text retrieval. Nevertheless, the state hierarchy in the original HHMM is restricted to a tree structure. This prohibits two different states from having the same child, and thus does not allow for sharing of common substructures in the model. In this paper, we present a general HHMM in which the state hierarchy can be a lattice allowing arbitrary sharing of substructures. Furthermore, we provide a method for numerical scaling to avoid underflow, an important issue in dealing with long observation sequences. We demonstrate the working of our method in a simulated environment where a hierarchical behavioral model is automatically learned and later used for recognition. },
        FILE = { :papers\\phung\\bui_phung_venkatesh_aaai04.pdf:PDF },
        GROUP = { Statistics, Hierarchical Hidden Markov Models (HMM,HHMM) },
        OWNER = { 184698H },
        TIMESTAMP = { 2010.08.11 },
        URL = { http://www.aaai.org/Papers/AAAI/2004/AAAI04-052.pdf },
    }
C
2022
  • Improving kernel online learning with a snapshot memory
    Trung Le, Khanh Nguyen and Dinh Phung. Machine Learning, jan 2022. [ | | pdf]
    We propose in this paper the Stochastic Variance-reduced Gradient Descent for Kernel Online Learning (DualSVRG), which obtains the ε-approximate linear convergence rate and is not vulnerable to the curse of kernelization. Our approach uses a variance reduction technique to reduce the variance when estimating full gradient, and further exploits recent work in dual space gradient descent for online learning to achieve model optimality. This is achieved by introducing the concept of an instant memory, which is a snapshot storing the most recent incoming data instances and proposing three transformer oracles, namely budget, coverage, and always-move oracles. We further develop rigorous theoretical analysis to demonstrate that our proposed approach can obtain the ε-approximate linear convergence rate, while maintaining model sparsity, hence encourages fast training. We conduct extensive experiments on several benchmark datasets to compare our DualSVRG with state-of-the-art baselines in both batch and online settings. The experimental results show that our DualSVRG yields superior predictive performance, while spending comparable training time with baselines.
    @ARTICLE { le_etal_ml22_improving,
        AUTHOR = { Trung Le and Khanh Nguyen and Dinh Phung },
        JOURNAL = { Machine Learning },
        TITLE = { Improving kernel online learning with a snapshot memory },
        YEAR = { 2022 },
        MONTH = { jan },
        PAGES = { 1--22 },
        ABSTRACT = { We propose in this paper the Stochastic Variance-reduced Gradient Descent for Kernel Online Learning (DualSVRG), which obtains the ε-approximate linear convergence rate and is not vulnerable to the curse of kernelization. Our approach uses a variance reduction technique to reduce the variance when estimating full gradient, and further exploits recent work in dual space gradient descent for online learning to achieve model optimality. This is achieved by introducing the concept of an instant memory, which is a snapshot storing the most recent incoming data instances and proposing three transformer oracles, namely budget, coverage, and always-move oracles. We further develop rigorous theoretical analysis to demonstrate that our proposed approach can obtain the ε-approximate linear convergence rate, while maintaining model sparsity, hence encourages fast training. We conduct extensive experiments on several benchmark datasets to compare our DualSVRG with state-of-the-art baselines in both batch and online settings. The experimental results show that our DualSVRG yields superior predictive performance, while spending comparable training time with baselines. },
        DOI = { https://doi.org/10.1007/s10994-021-06075-7 },
        TIMESTAMP = { 2022-03-19 },
        URL = { https://link.springer.com/article/10.1007/s10994-021-06075-7 },
    }
J
  • Node Co-Occurrence Based Graph Neural Networks for Knowledge Graph Link Prediction
    Nguyen, Dai Quoc, Tong, Vinh, Phung, Dinh and Nguyen, Dat Quoc. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, page 1589–1592, New York, NY, USA, 2022. [ | | pdf]
    We introduce a novel embedding model, named NoGE, which aims to integrate co-occurrence among entities and relations into graph neural networks to improve knowledge graph completion (i.e., link prediction). Given a knowledge graph, NoGE constructs a single graph considering entities and relations as individual nodes. NoGE then computes weights for edges among nodes based on the co-occurrence of entities and relations. Next, NoGE proposes Dual Quaternion Graph Neural Networks (DualQGNN) and utilizes DualQGNN to update vector representations for entity and relation nodes. NoGE then adopts a score function to produce the triple scores. Comprehensive experimental results show that NoGE obtains state-of-the-art results on three new and difficult benchmark datasets CoDEx for knowledge graph completion.
    @INPROCEEDINGS { nguyen_etal_wsdm22_node_cooccurrence,
        AUTHOR = { Nguyen, Dai Quoc and Tong, Vinh and Phung, Dinh and Nguyen, Dat Quoc },
        TITLE = { Node Co-Occurrence Based Graph Neural Networks for Knowledge Graph Link Prediction },
        YEAR = { 2022 },
        ISBN = { 9781450391320 },
        PUBLISHER = { Association for Computing Machinery },
        ADDRESS = { New York, NY, USA },
        URL = { https://doi.org/10.1145/3488560.3502183 },
        DOI = { 10.1145/3488560.3502183 },
        ABSTRACT = { We introduce a novel embedding model, named NoGE, which aims to integrate co-occurrence among entities and relations into graph neural networks to improve knowledge graph completion (i.e., link prediction). Given a knowledge graph, NoGE constructs a single graph considering entities and relations as individual nodes. NoGE then computes weights for edges among nodes based on the co-occurrence of entities and relations. Next, NoGE proposes Dual Quaternion Graph Neural Networks (DualQGNN) and utilizes DualQGNN to update vector representations for entity and relation nodes. NoGE then adopts a score function to produce the triple scores. Comprehensive experimental results show that NoGE obtains state-of-the-art results on three new and difficult benchmark datasets CoDEx for knowledge graph completion. },
        BOOKTITLE = { Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining },
        PAGES = { 1589–1592 },
        NUMPAGES = { 4 },
        KEYWORDS = { graph neural networks, knowledge graph embeddings, knowledge graph completion, quaternion },
        LOCATION = { Virtual Event, AZ, USA },
        SERIES = { WSDM '22 },
    }
C
  • Vietnamese Speech-Based Question Answering over Car Manuals
    Vo, Tin Duy, Luong, Manh, Le, Duong Minh, Tran, Hieu, Do, Nhan, Nguyen, Tuan-Duy H., Nguyen, Thien, Bui, Hung, Nguyen, Dat Quoc and Phung, Dinh. In 27th International Conference on Intelligent User Interfaces, page 117–119, New York, NY, USA, 2022. [ | | pdf]
    This paper presents a novel Vietnamese speech-based question answering system QA-CarManual that enables users to ask car-manual-related questions (e.g. how to properly operate devices and/or utilities within a car). Given a car manual written in Vietnamese as the main knowledge base, we develop QA-CarManual as a lightweight, real-time and interactive system that integrates state-of-the-art technologies in language and speech processing to (i) understand and interact with users via speech commands and (ii) automatically query the knowledge base and return answers in both forms of text and speech as well as visualization. To our best knowledge, QA-CarManual is the first Vietnamese question answering system that interacts with users via speech inputs and outputs. We perform a human evaluation to assess the quality of our QA-CarManual system and obtain promising results.
    @INPROCEEDINGS { vo_etal_iui22_vietnamese_speech,
        AUTHOR = { Vo, Tin Duy and Luong, Manh and Le, Duong Minh and Tran, Hieu and Do, Nhan and Nguyen, Tuan-Duy H. and Nguyen, Thien and Bui, Hung and Nguyen, Dat Quoc and Phung, Dinh },
        TITLE = { Vietnamese Speech-Based Question Answering over Car Manuals },
        YEAR = { 2022 },
        ISBN = { 9781450391450 },
        PUBLISHER = { Association for Computing Machinery },
        ADDRESS = { New York, NY, USA },
        URL = { https://doi.org/10.1145/3490100.3516525 },
        DOI = { 10.1145/3490100.3516525 },
        ABSTRACT = { This paper presents a novel Vietnamese speech-based question answering system QA-CarManual that enables users to ask car-manual-related questions (e.g. how to properly operate devices and/or utilities within a car). Given a car manual written in Vietnamese as the main knowledge base, we develop QA-CarManual as a lightweight, real-time and interactive system that integrates state-of-the-art technologies in language and speech processing to (i) understand and interact with users via speech commands and (ii) automatically query the knowledge base and return answers in both forms of text and speech as well as visualization. To our best knowledge, QA-CarManual is the first Vietnamese question answering system that interacts with users via speech inputs and outputs. We perform a human evaluation to assess the quality of our QA-CarManual system and obtain promising results. },
        BOOKTITLE = { 27th International Conference on Intelligent User Interfaces },
        PAGES = { 117–119 },
        NUMPAGES = { 3 },
        LOCATION = { Helsinki, Finland },
        SERIES = { IUI '22 Companion },
    }
C
  • A Unified Wasserstein Distributional Robustness Framework for Adversarial Training
    Anh Tuan Bui, Trung Le, Quan Hung Tran, He Zhao and Dinh Phung. ((in press)). [ | ]
    It is well-known that deep neural networks (DNNs) are susceptible to adversarial attacks, exposing a severe fragility of deep learning systems. As the result, adversarial training (AT) method, by incorporating adversarial examples during training, represents a natural and effective approach to strengthen the robustness of a DNN-based classifier. However, most AT-based methods, notably PGD-AT and TRADES, typically seek a pointwise adversary that generates the worst-case adversarial example by independently perturbing each data sample, as a way to ``probe the vulnerability of the classifier. Arguably, there are unexplored benefits in considering such adversarial effects from an entire distribution. To this end, this paper presents a unified framework that connects Wasserstein distributional robustness with current state-of-the-art AT methods. We introduce a new Wasserstein cost function and a new series of risk functions, with which we show that standard AT methods are special cases of their counterparts in our framework. This connection leads to an intuitive relaxation and generalization of existing AT methods and facilitates the development of a new family of distributional robustness AT-based algorithms. Extensive experiments show that our distributional robustness AT algorithms robustify further their standard AT counterparts in various settings.
    @MISC { accepted_anh_etal_iclr22_a_unified_wasserstein,
        AUTHOR = { Anh Tuan Bui and Trung Le and Quan Hung Tran and He Zhao and Dinh Phung },
        TITLE = { A Unified Wasserstein Distributional Robustness Framework for Adversarial Training },
        JOURNAL = { ICLR },
        ABSTRACT = { It is well-known that deep neural networks (DNNs) are susceptible to adversarial attacks, exposing a severe fragility of deep learning systems. As the result, adversarial training (AT) method, by incorporating adversarial examples during training, represents a natural and effective approach to strengthen the robustness of a DNN-based classifier. However, most AT-based methods, notably PGD-AT and TRADES, typically seek a pointwise adversary that generates the worst-case adversarial example by independently perturbing each data sample, as a way to ``probe
    the vulnerability of the classifier. Arguably, there are unexplored benefits in considering such adversarial effects from an entire distribution. To this end, this paper presents a unified framework that connects Wasserstein distributional robustness with current state-of-the-art AT methods. We introduce a new Wasserstein cost function and a new series of risk functions, with which we show that standard AT methods are special cases of their counterparts in our framework. This connection leads to an intuitive relaxation and generalization of existing AT methods and facilitates the development of a new family of distributional robustness AT-based algorithms. Extensive experiments show that our distributional robustness AT algorithms robustify further their standard AT counterparts in various settings. },
        NOTE = { (in press) },
    }
  • Sobolev Transport: A Scalable Metric for Probability Measures with Graph Metrics
    Le, Tam, Nguyen, Truyen, Phung, Dinh and Nguyen, Viet Anh. ((in press)). [ | ]
    Optimal transport (OT) is a popular measure to compare probability distributions. However, OT suffers a few drawbacks such as (i) a high complexity for computation, (ii) indefiniteness which limits its applicability to kernel machines. In this work, we consider probability measures supported on a graph metric space and propose a novel Sobolev transport metric. We show that the Sobolev transport metric yields a closed-form formula for fast computation and it is negative definite. We show that the space of probability measures endowed with this transport distance is isometric to a bounded convex set in a Euclidean space with a weighted $\ell_p$ distance. We further exploit the negative definiteness of the Sobolev transport to design positive-definite kernels, and evaluate their performances against other baselines in document classification with word embeddings and in topological data analysis.
    @MISC { accepted_le_etal_aistats22_sobolev_transport,
        AUTHOR = { Le, Tam and Nguyen, Truyen and Phung, Dinh and Nguyen, Viet Anh },
        TITLE = { Sobolev Transport: A Scalable Metric for Probability Measures with Graph Metrics },
        JOURNAL = { AISTATS },
        ABSTRACT = { Optimal transport (OT) is a popular measure to compare probability distributions. However, OT suffers a few drawbacks such as (i) a high complexity for computation, (ii) indefiniteness which limits its applicability to kernel machines. In this work, we consider probability measures supported on a graph metric space and propose a novel Sobolev transport metric. We show that the Sobolev transport metric yields a closed-form formula for fast computation and it is negative definite. We show that the space of probability measures endowed with this transport distance is isometric to a bounded convex set in a Euclidean space with a weighted $\ell_p$ distance. We further exploit the negative definiteness of the Sobolev transport to design positive-definite kernels, and evaluate their performances against other baselines in document classification with word embeddings and in topological data analysis. },
        NOTE = { (in press) },
    }
  • Particle-based Adversarial Local Distribution Regularization
    Nguyen-Duc, Thanh, Le, Trung, Zhao, He, Cai, Jianfei and Phung, Dinh. ((in press)). [ | ]
    To-be-updated
    @MISC { accepted_nguyen_etal_aistats22_particle_based,
        AUTHOR = { Nguyen-Duc, Thanh and Le, Trung and Zhao, He and Cai, Jianfei and Phung, Dinh },
        TITLE = { Particle-based Adversarial Local Distribution Regularization },
        JOURNAL = { AISTATS },
        ABSTRACT = { To-be-updated },
        NOTE = { (in press) },
    }
2021
  • Neural Topic Model via Optimal Transport
    He Zhao, Dinh Phung, Viet Huynh, Trung Le and Wray Buntine. In In Proc. of the 9th Int. Conf. on Learning Representations (ICLR), 2021. [ | ]
    @INPROCEEDINGS { zhao_etal_iclr2021_neural,
        AUTHOR = { He Zhao and Dinh Phung and Viet Huynh and Trung Le and Wray Buntine },
        BOOKTITLE = { In Proc. of the 9th Int. Conf. on Learning Representations (ICLR) },
        TITLE = { Neural Topic Model via Optimal Transport },
        YEAR = { 2021 },
        ARCHIVEPREFIX = { arXiv },
        EPRINT = { 2008.13537 },
        PRIMARYCLASS = { cs.IR },
        TIMESTAMP = { 2021-01-13 },
    }
C
  • Improving Ensemble Robustness by Collaboratively Promoting and Demoting Adversarial Robustness
    Anh Bui, Trung Le, He Zhao, Paul Montague, Olivier deVel, Tamas Abraham and Dinh Phung. In In Proc. of Int. Conf. on Artificial Intelligence (AAAI), 2021. [ | ]
    Ensemble-based adversarial training is a principled approach to achieve robustness against adversarial attacks. An important technique of this approach is to control the transferability of adversarial examples among ensemble members. We propose in this work a simple yet effective strategy to collaborate among committee models of an ensemble model. This is achieved via the secure and insecure sets defined for each model member on a given sample, hence help us to quantify and regularize the transferability. Consequently, our proposed framework provides the flexibility to reduce the adversarial transferability as well as to promote the diversity of ensemble members, which are two crucial factors for better robustness in our ensemble approach. We conduct extensive and comprehensive experiments to demonstrate that our proposed method outperforms the state-of-the-art ensemble baselines, at the same time can detect a wide range of adversarial examples with a nearly perfect accuracy.
    @INPROCEEDINGS { bui_etal_20aaai_improving,
        AUTHOR = { Anh Bui and Trung Le and He Zhao and Paul Montague and Olivier deVel and Tamas Abraham and Dinh Phung },
        BOOKTITLE = { In Proc. of Int. Conf. on Artificial Intelligence (AAAI) },
        TITLE = { Improving Ensemble Robustness by Collaboratively Promoting and Demoting Adversarial Robustness },
        YEAR = { 2021 },
        ABSTRACT = { Ensemble-based adversarial training is a principled approach to achieve robustness against adversarial attacks. An important technique of this approach is to control the transferability of adversarial examples among ensemble members. We propose in this work a simple yet effective strategy to collaborate among committee models of an ensemble model. This is achieved via the secure and insecure sets defined for each model member on a given sample, hence help us to quantify and regularize the transferability. Consequently, our proposed framework provides the flexibility to reduce the adversarial transferability as well as to promote the diversity of ensemble members, which are two crucial factors for better robustness in our ensemble approach. We conduct extensive and comprehensive experiments to demonstrate that our proposed method outperforms the state-of-the-art ensemble baselines, at the same time can detect a wide range of adversarial examples with a nearly perfect accuracy. },
        DATE = { 2020-09-21 },
        EPRINT = { 2009.09612 },
        EPRINTCLASS = { cs.CV },
        EPRINTTYPE = { arXiv },
        FILE = { :bui_etal_20aaai_improving - Improving Ensemble Robustness by Collaboratively Promoting and Demoting Adversarial Robustness.pdf:PDF },
        JOURNAL = { In Proc. of Int. Conf. on Artificial Intelligence (AAAI) },
        KEYWORDS = { cs.CV, cs.LG },
    }
C
  • Exploiting Domain-Specific Features to Enhance Domain Generalization
    Ha Bui, Toan Tran, Anh Tuan Tran and Dinh Phung. In Advances in Neural Information Processing Systems, 2021. [ | | pdf]
    @INPROCEEDINGS { ha_etal_neurips21_exploiting,
        TITLE = { Exploiting Domain-Specific Features to Enhance Domain Generalization },
        AUTHOR = { Ha Bui and Toan Tran and Anh Tuan Tran and Dinh Phung },
        BOOKTITLE = { Advances in Neural Information Processing Systems },
        EDITOR = { A. Beygelzimer and Y. Dauphin and P. Liang and J. Wortman Vaughan },
        YEAR = { 2021 },
        URL = { https://openreview.net/forum?id=vKxFYApxBjr },
    }
C
  • On Learning Domain-Invariant Representations for Transfer Learning with Multiple Sources
    Trung Quoc Phung, Trung Le, Long Tung Vuong, Toan Tran, Anh Tuan Tran, Hung Bui and Dinh Phung. In Advances in Neural Information Processing Systems, 2021. [ | | pdf]
    @INPROCEEDINGS { trung_etal_neurips21_on_learning_domain_invariant,
        TITLE = { On Learning Domain-Invariant Representations for Transfer Learning with Multiple Sources },
        AUTHOR = { Trung Quoc Phung and Trung Le and Long Tung Vuong and Toan Tran and Anh Tuan Tran and Hung Bui and Dinh Phung },
        BOOKTITLE = { Advances in Neural Information Processing Systems },
        EDITOR = { A. Beygelzimer and Y. Dauphin and P. Liang and J. Wortman Vaughan },
        YEAR = { 2021 },
        URL = { https://openreview.net/forum?id=LkNBNOut0oD },
    }
C
  • Most: multi-source domain adaptation via optimal transport for student-teacher learning
    Nguyen, Tuan, Le, Trung, Zhao, He, Tran, Quan Hung, Nguyen, Truyen and Phung, Dinh. In Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, pages 225-235, 27--30 Jul 2021. [ | | pdf]
    Multi-source domain adaptation (DA) is more challenging than conventional DA because the knowledge is transferred from several source domains to a target domain. To this end, we propose in this paper a novel model for multi-source DA using the theory of optimal transport and imitation learning. More specifically, our approach consists of two cooperative agents: a teacher classifier and a student classifier. The teacher classifier is a combined expert that leverages knowledge of domain experts that can be theoretically guaranteed to handle perfectly source examples, while the student classifier acting on the target domain tries to imitate the teacher classifier acting on the source domains. Our rigorous theory developed based on optimal transport makes this cross-domain imitation possible and also helps to mitigate not only the data shift but also the label shift, which are inherently thorny issues in DA research. We conduct comprehensive experiments on real-world datasets to demonstrate the merit of our approach and its optimal transport based imitation learning viewpoint. Experimental results show that our proposed method achieves state-of-the-art performance on benchmark datasets for multi-source domain adaptation including Digits-five, Office-Caltech10, and Office-31 to the best of our knowledge.
    @INPROCEEDINGS { nguyen_etal_uai21_most,
        TITLE = { Most: multi-source domain adaptation via optimal transport for student-teacher learning },
        AUTHOR = { Nguyen, Tuan and Le, Trung and Zhao, He and Tran, Quan Hung and Nguyen, Truyen and Phung, Dinh },
        BOOKTITLE = { Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence },
        PAGES = { 225--235 },
        YEAR = { 2021 },
        EDITOR = { de Campos, Cassio and Maathuis, Marloes H. },
        VOLUME = { 161 },
        SERIES = { Proceedings of Machine Learning Research },
        MONTH = { 27--30 Jul },
        PUBLISHER = { PMLR },
        PDF = { https://proceedings.mlr.press/v161/nguyen21a/nguyen21a.pdf },
        URL = { https://proceedings.mlr.press/v161/nguyen21a.html },
        ABSTRACT = { Multi-source domain adaptation (DA) is more challenging than conventional DA because the knowledge is transferred from several source domains to a target domain. To this end, we propose in this paper a novel model for multi-source DA using the theory of optimal transport and imitation learning. More specifically, our approach consists of two cooperative agents: a teacher classifier and a student classifier. The teacher classifier is a combined expert that leverages knowledge of domain experts that can be theoretically guaranteed to handle perfectly source examples, while the student classifier acting on the target domain tries to imitate the teacher classifier acting on the source domains. Our rigorous theory developed based on optimal transport makes this cross-domain imitation possible and also helps to mitigate not only the data shift but also the label shift, which are inherently thorny issues in DA research. We conduct comprehensive experiments on real-world datasets to demonstrate the merit of our approach and its optimal transport based imitation learning viewpoint. Experimental results show that our proposed method achieves state-of-the-art performance on benchmark datasets for multi-source domain adaptation including Digits-five, Office-Caltech10, and Office-31 to the best of our knowledge. },
    }
C
  • Quaternion Graph Neural Networks
    Nguyen, Dai Quoc, Nguyen, Tu Dinh and Phung, Dinh. In Proceedings of The 13th Asian Conference on Machine Learning, pages 236-251, 17--19 Nov 2021. [ | | pdf]
    Recently, graph neural networks (GNNs) have become an important and active research direction in deep learning. It is worth noting that most of the existing GNN-based methods learn graph representations within the Euclidean vector space. Beyond the Euclidean space, learning representation and embeddings in hyper-complex space have also shown to be a promising and effective approach. To this end, we propose Quaternion Graph Neural Networks (QGNN) to learn graph representations within the Quaternion space. As demonstrated, the Quaternion space, a hyper-complex vector space, provides highly meaningful computations and analogical calculus through Hamilton product compared to the Euclidean and complex vector spaces. Our QGNN obtains state-of-the-art results on a range of benchmark datasets for graph classification and node classification. Besides, regarding knowledge graphs, our QGNN-based embedding model achieves state-of-the-art results on three new and challenging benchmark datasets for knowledge graph completion. Our code is available at: \url{https://github.com/daiquocnguyen/QGNN}.
    @INPROCEEDINGS { nguyen_etal_acml21_quaternion,
        TITLE = { Quaternion Graph Neural Networks },
        AUTHOR = { Nguyen, Dai Quoc and Nguyen, Tu Dinh and Phung, Dinh },
        BOOKTITLE = { Proceedings of The 13th Asian Conference on Machine Learning },
        PAGES = { 236--251 },
        YEAR = { 2021 },
        EDITOR = { Balasubramanian, Vineeth N. and Tsang, Ivor },
        VOLUME = { 157 },
        SERIES = { Proceedings of Machine Learning Research },
        MONTH = { 17--19 Nov },
        PUBLISHER = { PMLR },
        PDF = { https://proceedings.mlr.press/v157/nguyen21a/nguyen21a.pdf },
        URL = { https://proceedings.mlr.press/v157/nguyen21a.html },
        ABSTRACT = { Recently, graph neural networks (GNNs) have become an important and active research direction in deep learning. It is worth noting that most of the existing GNN-based methods learn graph representations within the Euclidean vector space. Beyond the Euclidean space, learning representation and embeddings in hyper-complex space have also shown to be a promising and effective approach. To this end, we propose Quaternion Graph Neural Networks (QGNN) to learn graph representations within the Quaternion space. As demonstrated, the Quaternion space, a hyper-complex vector space, provides highly meaningful computations and analogical calculus through Hamilton product compared to the Euclidean and complex vector spaces. Our QGNN obtains state-of-the-art results on a range of benchmark datasets for graph classification and node classification. Besides, regarding knowledge graphs, our QGNN-based embedding model achieves state-of-the-art results on three new and challenging benchmark datasets for knowledge graph completion. Our code is available at: \url{https://github.com/daiquocnguyen/QGNN}. },
    }
C
  • Generalised Unsupervised Domain Adaptation of Neural Machine Translation with Cross-Lingual Data Selection
    "Vu. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3335-3346, "Online and Punta Cana, nov 2021. [ | | pdf]
    "This paper considers the unsupervised domain adaptation problem for neural machine translation (NMT)
    @INPROCEEDINGS { vu_etal_emnlp21_generalised,
        TITLE = { Generalised Unsupervised Domain Adaptation of Neural Machine Translation with Cross-Lingual Data Selection },
        AUTHOR = { "Vu },
        BOOKTITLE = { Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing },
        MONTH = { nov },
        YEAR = { 2021 },
        ADDRESS = { "Online and Punta Cana },
        PUBLISHER = { Association for Computational Linguistics },
        URL = { https://aclanthology.org/2021.emnlp-main.268 },
        DOI = { 10.18653/v1/2021.emnlp-main.268 },
        PAGES = { 3335--3346 },
        ABSTRACT = { "This paper considers the unsupervised domain adaptation problem for neural machine translation (NMT) },
    }
C
  • Information-theoretic Source Code Vulnerability Highlighting
    Nguyen, Van, Le, Trung, De Vel, Olivier, Montague, Paul, Grundy, John and Phung, Dinh. In 2021 International Joint Conference on Neural Networks (IJCNN), pages 1-8, 2021. [ | ]
    @INPROCEEDINGS { nguyen_etal_ijcnn21_information_theoretic,
        AUTHOR = { Nguyen, Van and Le, Trung and De Vel, Olivier and Montague, Paul and Grundy, John and Phung, Dinh },
        BOOKTITLE = { 2021 International Joint Conference on Neural Networks (IJCNN) },
        TITLE = { Information-theoretic Source Code Vulnerability Highlighting },
        YEAR = { 2021 },
        VOLUME = { },
        NUMBER = { },
        PAGES = { 1-8 },
        DOI = { 10.1109/IJCNN52387.2021.9533907 },
    }
C
  • LAMDA: Label Matching Deep Domain Adaptation
    Le, Trung, Nguyen, Tuan, Ho, Nhat, Bui, Hung and Phung, Dinh. In Proceedings of the 38th International Conference on Machine Learning, pages 6043-6054, 18--24 Jul 2021. [ | | pdf]
    Deep domain adaptation (DDA) approaches have recently been shown to perform better than their shallow rivals with better modeling capacity on complex domains (e.g., image, structural data, and sequential data). The underlying idea is to learn domain invariant representations on a latent space that can bridge the gap between source and target domains. Several theoretical studies have established insightful understanding and the benefit of learning domain invariant features; however, they are usually limited to the case where there is no label shift, hence hindering its applicability. In this paper, we propose and study a new challenging setting that allows us to use a Wasserstein distance (WS) to not only quantify the data shift but also to define the label shift directly. We further develop a theory to demonstrate that minimizing the WS of the data shift leads to closing the gap between the source and target data distributions on the latent space (e.g., an intermediate layer of a deep net), while still being able to quantify the label shift with respect to this latent space. Interestingly, our theory can consequently explain certain drawbacks of learning domain invariant features on the latent space. Finally, grounded on the results and guidance of our developed theory, we propose the Label Matching Deep Domain Adaptation (LAMDA) approach that outperforms baselines on real-world datasets for DA problems.
    @INPROCEEDINGS { le_etal_icml21_lamda,
        TITLE = { LAMDA: Label Matching Deep Domain Adaptation },
        AUTHOR = { Le, Trung and Nguyen, Tuan and Ho, Nhat and Bui, Hung and Phung, Dinh },
        BOOKTITLE = { Proceedings of the 38th International Conference on Machine Learning },
        PAGES = { 6043--6054 },
        YEAR = { 2021 },
        EDITOR = { Meila, Marina and Zhang, Tong },
        VOLUME = { 139 },
        SERIES = { Proceedings of Machine Learning Research },
        MONTH = { 18--24 Jul },
        PUBLISHER = { PMLR },
        PDF = { http://proceedings.mlr.press/v139/le21a/le21a.pdf },
        URL = { https://proceedings.mlr.press/v139/le21a.html },
        ABSTRACT = { Deep domain adaptation (DDA) approaches have recently been shown to perform better than their shallow rivals with better modeling capacity on complex domains (e.g., image, structural data, and sequential data). The underlying idea is to learn domain invariant representations on a latent space that can bridge the gap between source and target domains. Several theoretical studies have established insightful understanding and the benefit of learning domain invariant features; however, they are usually limited to the case where there is no label shift, hence hindering its applicability. In this paper, we propose and study a new challenging setting that allows us to use a Wasserstein distance (WS) to not only quantify the data shift but also to define the label shift directly. We further develop a theory to demonstrate that minimizing the WS of the data shift leads to closing the gap between the source and target data distributions on the latent space (e.g., an intermediate layer of a deep net), while still being able to quantify the label shift with respect to this latent space. Interestingly, our theory can consequently explain certain drawbacks of learning domain invariant features on the latent space. Finally, grounded on the results and guidance of our developed theory, we propose the Label Matching Deep Domain Adaptation (LAMDA) approach that outperforms baselines on real-world datasets for DA problems. },
    }
C
  • Topic Modelling Meets Deep Neural Networks: A Survey
    Zhao, He, Phung, Dinh, Huynh, Viet, Jin, Yuan, Du, Lan and Buntine, Wray. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, {IJCAI-21}, pages 4713-4720, 8 2021. (Survey Track} doi = {10.24963/ijcai.2021/638). [ | | pdf]
    @INPROCEEDINGS { zhao_etal_ijcai21_topic_modelling,
        TITLE = { Topic Modelling Meets Deep Neural Networks: A Survey },
        AUTHOR = { Zhao, He and Phung, Dinh and Huynh, Viet and Jin, Yuan and Du, Lan and Buntine, Wray },
        BOOKTITLE = { Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, {IJCAI-21} },
        PUBLISHER = { International Joint Conferences on Artificial Intelligence Organization },
        EDITOR = { Zhi-Hua Zhou },
        PAGES = { 4713--4720 },
        YEAR = { 2021 },
        MONTH = { 8 },
        NOTE = { Survey Track} doi = {10.24963/ijcai.2021/638 },
        URL = { https://doi.org/10.24963/ijcai.2021/638 },
    }
C
  • Explain2Attack: Text Adversarial Attacks via Cross-Domain Interpretability
    Hossam, Mahmoud, Le, Trung, Zhao, He and Phung, Dinh. In 2020 25th International Conference on Pattern Recognition (ICPR), pages 8922-8928, 2021. [ | ]
    @INPROCEEDINGS { hossam_etal_icpr21_explain2attack,
        AUTHOR = { Hossam, Mahmoud and Le, Trung and Zhao, He and Phung, Dinh },
        BOOKTITLE = { 2020 25th International Conference on Pattern Recognition (ICPR) },
        TITLE = { Explain2Attack: Text Adversarial Attacks via Cross-Domain Interpretability },
        YEAR = { 2021 },
        VOLUME = { },
        NUMBER = { },
        PAGES = { 8922-8928 },
        DOI = { 10.1109/ICPR48806.2021.9412526 },
    }
C
  • STEM: An Approach to Multi-Source Domain Adaptation With Guarantees
    Nguyen, Van-Anh, Nguyen, Tuan, Le, Trung, Tran, Quan Hung and Phung, Dinh. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9352-9363, October 2021. [ | ]
    @INPROCEEDINGS { nguyen_etal_iccv21_stem,
        AUTHOR = { Nguyen, Van-Anh and Nguyen, Tuan and Le, Trung and Tran, Quan Hung and Phung, Dinh },
        TITLE = { STEM: An Approach to Multi-Source Domain Adaptation With Guarantees },
        BOOKTITLE = { Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) },
        MONTH = { October },
        YEAR = { 2021 },
        PAGES = { 9352-9363 },
    }
C
  • Application of Machine Learning Techniques to Identify Data Reliability and Factors Affecting Outcome After Stroke Using Electronic Administrative Records
    Rana, Santu, Luo, Wei, Tran, Truyen, Venkatesh, Svetha, Talman, Paul, Phan, Thanh, Phung, Dinh and Clissold, Benjamin. Frontiers in Neurology, 12, 2021. [ | | pdf]
    Aim: To use available electronic administrative records to identify data reliability, predict discharge destination, and identify risk factors associated with specific outcomes following hospital admission with stroke, compared to stroke specific clinical factors, using machine learning techniques.Method: The study included 2,531 patients having at least one admission with a confirmed diagnosis of stroke, collected from a regional hospital in Australia within 2009–2013. Using machine learning (penalized regression with Lasso) techniques, patients having their index admission between June 2009 and July 2012 were used to derive predictive models, and patients having their index admission between July 2012 and June 2013 were used for validation. Three different stroke types [intracerebral hemorrhage (ICH), ischemic stroke, transient ischemic attack (TIA)] were considered and five different comparison outcome settings were considered. Our electronic administrative record based predictive model was compared with a predictive model composed of “baseline” clinical features, more specific for stroke, such as age, gender, smoking habits, co-morbidities (high cholesterol, hypertension, atrial fibrillation, and ischemic heart disease), types of imaging done (CT scan, MRI, etc.), and occurrence of in-hospital pneumonia. Risk factors associated with likelihood of negative outcomes were identified.Results: The data was highly reliable at predicting discharge to rehabilitation and all other outcomes vs. death for ICH (AUC 0.85 and 0.825, respectively), all discharge outcomes except home vs. rehabilitation for ischemic stroke, and discharge home vs. others and home vs. rehabilitation for TIA (AUC 0.948 and 0.873, respectively). Electronic health record data appeared to provide improved prediction of outcomes over stroke specific clinical factors from the machine learning models. Common risk factors associated with a negative impact on expected outcomes appeared clinically intuitive, and included older age groups, prior ventilatory support, urinary incontinence, need for imaging, and need for allied health input.Conclusion: Electronic administrative records from this cohort produced reliable outcome prediction and identified clinically appropriate factors negatively impacting most outcome variables following hospital admission with stroke. This presents a means of future identification of modifiable factors associated with patient discharge destination. This may potentially aid in patient selection for certain interventions and aid in better patient and clinician education regarding expected discharge outcomes.
    @ARTICLE { rana_etal_frontiersin21_application,
        AUTHOR = { Rana, Santu and Luo, Wei and Tran, Truyen and Venkatesh, Svetha and Talman, Paul and Phan, Thanh and Phung, Dinh and Clissold, Benjamin },
        TITLE = { Application of Machine Learning Techniques to Identify Data Reliability and Factors Affecting Outcome After Stroke Using Electronic Administrative Records },
        JOURNAL = { Frontiers in Neurology },
        VOLUME = { 12 },
        YEAR = { 2021 },
        URL = { https://www.frontiersin.org/article/10.3389/fneur.2021.670379 },
        DOI = { 10.3389/fneur.2021.670379 },
        ISSN = { 1664-2295 },
        ABSTRACT = { Aim: To use available electronic administrative records to identify data reliability, predict discharge destination, and identify risk factors associated with specific outcomes following hospital admission with stroke, compared to stroke specific clinical factors, using machine learning techniques.Method: The study included 2,531 patients having at least one admission with a confirmed diagnosis of stroke, collected from a regional hospital in Australia within 2009–2013. Using machine learning (penalized regression with Lasso) techniques, patients having their index admission between June 2009 and July 2012 were used to derive predictive models, and patients having their index admission between July 2012 and June 2013 were used for validation. Three different stroke types [intracerebral hemorrhage (ICH), ischemic stroke, transient ischemic attack (TIA)] were considered and five different comparison outcome settings were considered. Our electronic administrative record based predictive model was compared with a predictive model composed of “baseline” clinical features, more specific for stroke, such as age, gender, smoking habits, co-morbidities (high cholesterol, hypertension, atrial fibrillation, and ischemic heart disease), types of imaging done (CT scan, MRI, etc.), and occurrence of in-hospital pneumonia. Risk factors associated with likelihood of negative outcomes were identified.Results: The data was highly reliable at predicting discharge to rehabilitation and all other outcomes vs. death for ICH (AUC 0.85 and 0.825, respectively), all discharge outcomes except home vs. rehabilitation for ischemic stroke, and discharge home vs. others and home vs. rehabilitation for TIA (AUC 0.948 and 0.873, respectively). Electronic health record data appeared to provide improved prediction of outcomes over stroke specific clinical factors from the machine learning models. Common risk factors associated with a negative impact on expected outcomes appeared clinically intuitive, and included older age groups, prior ventilatory support, urinary incontinence, need for imaging, and need for allied health input.Conclusion: Electronic administrative records from this cohort produced reliable outcome prediction and identified clinically appropriate factors negatively impacting most outcome variables following hospital admission with stroke. This presents a means of future identification of modifiable factors associated with patient discharge destination. This may potentially aid in patient selection for certain interventions and aid in better patient and clinician education regarding expected discharge outcomes. },
    }
J
  • Optimal Transport for Deep Generative Models: State of the Art and Research Challenges
    Huynh, Viet, Phung, Dinh and Zhao, He. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, {IJCAI-21}, pages 4450-4457, 8 2021. (Survey Track} doi = {10.24963/ijcai.2021/607). [ | | pdf]
    @INPROCEEDINGS { huynh_etal_ijcai21_optimal_transport,
        TITLE = { Optimal Transport for Deep Generative Models: State of the Art and Research Challenges },
        AUTHOR = { Huynh, Viet and Phung, Dinh and Zhao, He },
        BOOKTITLE = { Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, {IJCAI-21} },
        PUBLISHER = { International Joint Conferences on Artificial Intelligence Organization },
        EDITOR = { Zhi-Hua Zhou },
        PAGES = { 4450--4457 },
        YEAR = { 2021 },
        MONTH = { 8 },
        NOTE = { Survey Track} doi = {10.24963/ijcai.2021/607 },
        URL = { https://doi.org/10.24963/ijcai.2021/607 },
    }
C
  • TIDOT: A Teacher Imitation Learning Approach for Domain Adaptation with Optimal Transport
    Nguyen, Tuan, Le, Trung, Dam, Nhan, Tran, Quan Hung, Nguyen, Truyen and Phung, Dinh. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, {IJCAI-21}, pages 2862-2868, 8 2021. (Main Track} doi = {10.24963/ijcai.2021/394). [ | | pdf]
    @INPROCEEDINGS { nguyen_etal_ijcai21_tidot,
        TITLE = { TIDOT: A Teacher Imitation Learning Approach for Domain Adaptation with Optimal Transport },
        AUTHOR = { Nguyen, Tuan and Le, Trung and Dam, Nhan and Tran, Quan Hung and Nguyen, Truyen and Phung, Dinh },
        BOOKTITLE = { Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, {IJCAI-21} },
        PUBLISHER = { International Joint Conferences on Artificial Intelligence Organization },
        EDITOR = { Zhi-Hua Zhou },
        PAGES = { 2862--2868 },
        YEAR = { 2021 },
        MONTH = { 8 },
        NOTE = { Main Track} doi = {10.24963/ijcai.2021/394 },
        URL = { https://doi.org/10.24963/ijcai.2021/394 },
    }
C
  • On efficient multilevel Clustering via Wasserstein distances
    Viet Huynh, Nhat Ho, Nhan Dam, XuanLong Nguyen, Mikhail Yurochkin, Hung Bui and Dinh Phung. Journal of Machine Learning Research, 22(145):1-43, 2021. [ | | pdf]
    @ARTICLE { viet_etal_jmlr21_on_efficient_multilevel,
        AUTHOR = { Viet Huynh and Nhat Ho and Nhan Dam and XuanLong Nguyen and Mikhail Yurochkin and Hung Bui and Dinh Phung },
        TITLE = { On efficient multilevel Clustering via Wasserstein distances },
        JOURNAL = { Journal of Machine Learning Research },
        YEAR = { 2021 },
        VOLUME = { 22 },
        NUMBER = { 145 },
        PAGES = { 1-43 },
        URL = { http://jmlr.org/papers/v22/19-782.html },
    }
J
  • The Monash Autism-ADHD genetics and neurodevelopment (MAGNET) project design and methodologies: a dimensional approach to understanding neurobiological and genetic aetiology
    Knott Rachael, Johnson Beth P., Tiego Jeggan, Mellahn Olivia, Finlay Amy, Kallady Kathryn, Kouspos Maria, Mohanakumar Sindhu Vishnu Priya, Hawi Ziarih, Arnatkeviciute Aurina, Chau Tracey, Maron Dalia, Mercieca Emily-Clare, Furley Kirsten, Harris Katrina, Williams Katrina, Ure Alexandra, Fornito Alex, Gray Kylie, Coghill David, Nicholson Ann, Phung Dinh, Loth Eva, Mason Luke, Murphy Declan, Buitelaar Jan and Bellgrove Mark A.. Molecular Autism, 12(1):55, Aug 2021. [ | | pdf]
    ASD and ADHD are prevalent neurodevelopmental disorders that frequently co-occur and have strong evidence for a degree of shared genetic aetiology. Behavioural and neurocognitive heterogeneity in ASD and ADHD has hampered attempts to map the underlying genetics and neurobiology, predict intervention response, and improve diagnostic accuracy. Moving away from categorical conceptualisations of psychopathology to a dimensional approach is anticipated to facilitate discovery of data-driven clusters and enhance our understanding of the neurobiological and genetic aetiology of these conditions. The Monash Autism-ADHD genetics and neurodevelopment (MAGNET) project is one of the first large-scale, family-based studies to take a truly transdiagnostic approach to ASD and ADHD. Using a comprehensive phenotyping protocol capturing dimensional traits central to ASD and ADHD, the MAGNET project aims to identify data-driven clusters across ADHD-ASD spectra using deep phenotyping of symptoms and behaviours; investigate the degree of familiality for different dimensional ASD-ADHD phenotypes and clusters; and map the neurocognitive, brain imaging, and genetic correlates of these data-driven symptom-based clusters.
    @ARTICLE { knott_etal_MolecularAutism21_the_monash_autism,
        AUTHOR = { Knott Rachael and Johnson Beth P. and Tiego Jeggan and Mellahn Olivia and Finlay Amy and Kallady Kathryn and Kouspos Maria and Mohanakumar Sindhu Vishnu Priya and Hawi Ziarih and Arnatkeviciute Aurina and Chau Tracey and Maron Dalia and Mercieca Emily-Clare and Furley Kirsten and Harris Katrina and Williams Katrina and Ure Alexandra and Fornito Alex and Gray Kylie and Coghill David and Nicholson Ann and Phung Dinh and Loth Eva and Mason Luke and Murphy Declan and Buitelaar Jan and Bellgrove Mark A. },
        TITLE = { The Monash Autism-ADHD genetics and neurodevelopment (MAGNET) project design and methodologies: a dimensional approach to understanding neurobiological and genetic aetiology },
        JOURNAL = { Molecular Autism },
        YEAR = { 2021 },
        MONTH = { Aug },
        DAY = { 05 },
        VOLUME = { 12 },
        NUMBER = { 1 },
        PAGES = { 55 },
        ABSTRACT = { ASD and ADHD are prevalent neurodevelopmental disorders that frequently co-occur and have strong evidence for a degree of shared genetic aetiology. Behavioural and neurocognitive heterogeneity in ASD and ADHD has hampered attempts to map the underlying genetics and neurobiology, predict intervention response, and improve diagnostic accuracy. Moving away from categorical conceptualisations of psychopathology to a dimensional approach is anticipated to facilitate discovery of data-driven clusters and enhance our understanding of the neurobiological and genetic aetiology of these conditions. The Monash Autism-ADHD genetics and neurodevelopment (MAGNET) project is one of the first large-scale, family-based studies to take a truly transdiagnostic approach to ASD and ADHD. Using a comprehensive phenotyping protocol capturing dimensional traits central to ASD and ADHD, the MAGNET project aims to identify data-driven clusters across ADHD-ASD spectra using deep phenotyping of symptoms and behaviours; investigate the degree of familiality for different dimensional ASD-ADHD phenotypes and clusters; and map the neurocognitive, brain imaging, and genetic correlates of these data-driven symptom-based clusters. },
        ISSN = { 2040-2392 },
        DOI = { 10.1186/s13229-021-00457-3 },
        URL = { https://doi.org/10.1186/s13229-021-00457-3 },
    }
J
2020
  • Robust Variational Learning for Multiclass Kernel Models with Stein Refinement
    Khanh Nguyen, Trung Le, Geoff Webb and Dinh Phung. Transactions on Knowledge and Data Engineering (TKDE), 2020. [ | ]
    @ARTICLE { nguyen_etal_tkde20_robusvariational,
        AUTHOR = { Khanh Nguyen and Trung Le and Geoff Webb and Dinh Phung },
        JOURNAL = { Transactions on Knowledge and Data Engineering (TKDE) },
        TITLE = { Robust Variational Learning for Multiclass Kernel Models with Stein Refinement },
        YEAR = { 2020 },
    }
J
  • Explain by Evidence: An Explainable Memory-based Neural Network for Question Answering
    Quan Hung Tran, Nhan Dam, Tuan Lai, Franck Dernoncourt, Trung Le, Nham Le and Dinh Phung. In In Proc. of the 28th Int. Conf. on Computational Linguistics (COLING), 2020. [ | | pdf]
    Interpretability and explainability of deep neural networks are challenging due to their scale, complexity, and the agreeable notions on which the explaining process rests. Previous work, in particular, has focused on representing internal components of neural networks through human-friendly visuals and concepts. On the other hand, in real life, when making a decision, human tends to rely on similar situations and/or associations in the past. Hence arguably, a promising approach to make the model transparent is to design it in a way such that the model explicitly connects the current sample with the seen ones, and bases its decision on these samples. Grounded on that principle, we propose in this paper an explainable, evidence-based memory network architecture, which learns to summarize the dataset and extract supporting evidences to make its decision. Our model achieves state-of-the-art performance on two popular question answering datasets (i.e. TrecQA and WikiQA). Via further analysis, we show that this model can reliably trace the errors it has made in the validation step to the training instances that might have caused these errors. We believe that this error-tracing capability provides significant benefit in improving dataset quality in many applications.
    @INPROCEEDINGS { tran_etal_coling20_explainbyevidence,
        AUTHOR = { Quan Hung Tran and Nhan Dam and Tuan Lai and Franck Dernoncourt and Trung Le and Nham Le and Dinh Phung },
        BOOKTITLE = { In Proc. of the 28th Int. Conf. on Computational Linguistics (COLING) },
        TITLE = { Explain by Evidence: An Explainable Memory-based Neural Network for Question Answering },
        YEAR = { 2020 },
        ABSTRACT = { Interpretability and explainability of deep neural networks are challenging due to their scale, complexity, and the agreeable notions on which the explaining process rests. Previous work, in particular, has focused on representing internal components of neural networks through human-friendly visuals and concepts. On the other hand, in real life, when making a decision, human tends to rely on similar situations and/or associations in the past. Hence arguably, a promising approach to make the model transparent is to design it in a way such that the model explicitly connects the current sample with the seen ones, and bases its decision on these samples. Grounded on that principle, we propose in this paper an explainable, evidence-based memory network architecture, which learns to summarize the dataset and extract supporting evidences to make its decision. Our model achieves state-of-the-art performance on two popular question answering datasets (i.e. TrecQA and WikiQA). Via further analysis, we show that this model can reliably trace the errors it has made in the validation step to the training instances that might have caused these errors. We believe that this error-tracing capability provides significant benefit in improving dataset quality in many applications. },
        FILE = { :tran_etal_coling20_explainbyevidence - Explain by Evidence_ an Explainable Memory Based Neural Network for Question Answering.pdf:PDF },
        URL = { https://arxiv.org/abs/2011.03096 },
    }
C
  • Transfer2Attack: Text Adversarial Attack with Cross-Domain Interpretability
    Mahmoud Hossam, Trung Le, He Zhao and Dinh Phung. In In Proc. of the 25th Int. Conf. on Pattern Recognition (ICPR), 2020. [ | ]
    Training robust deep learning models is a critical challenge for downstream tasks. Research has shown that common down-stream models can be easily fooled with adversarial inputs that look like the training data, but slightly perturbed, in a way imperceptible to humans. Understanding the behavior of natural language models under these attacks is crucial to better defend these models against such attacks. In the black-box attack setting, where no access to model parameters is available, the attacker can only query the output information from the targeted model to craft a successful attack. Current black-box state-of-the-art models are costly in both computational complexity and number of queries needed to craft successful adversarial examples. For real world scenarios, the number of queries is critical, where less queries are desired to avoid suspicion towards an attacking agent. In this paper, we propose Transfer2Attack, a black-box adversarial attack on text classification task, that employs cross-domain interpretability to reduce target model queries during attack. We show that our framework either achieves or out-performs attack rates of the state-of-the-art models, yet with lower queries cost and higher efficiency.
    @INPROCEEDINGS { hossam_etal_icpr20_transfer2attack,
        AUTHOR = { Mahmoud Hossam and Trung Le and He Zhao and Dinh Phung },
        BOOKTITLE = { In Proc. of the 25th Int. Conf. on Pattern Recognition (ICPR) },
        TITLE = { {Transfer2Attack}: Text Adversarial Attack with Cross-Domain Interpretability },
        YEAR = { 2020 },
        ABSTRACT = { Training robust deep learning models is a critical challenge for downstream tasks. Research has shown that common down-stream models can be easily fooled with adversarial inputs that look like the training data, but slightly perturbed, in a way imperceptible to humans. Understanding the behavior of natural language models under these attacks is crucial to better defend these models against such attacks. In the black-box attack setting, where no access to model parameters is available, the attacker can only query the output information from the targeted model to craft a successful attack. Current black-box state-of-the-art models are costly in both computational complexity and number of queries needed to craft successful adversarial examples. For real world scenarios, the number of queries is critical, where less queries are desired to avoid suspicion towards an attacking agent. In this paper, we propose Transfer2Attack, a black-box adversarial attack on text classification task, that employs cross-domain interpretability to reduce target model queries during attack. We show that our framework either achieves or out-performs attack rates of the state-of-the-art models, yet with lower queries cost and higher efficiency. },
        FILE = { :hossam_etal_icpr20_transfer2attack - Transfer2Attack_ Text Adversarial Attack with Cross Domain Interpretability.pdf:PDF },
    }
C
  • A Capsule Network-based Model for Learning Node Embeddings
    Dai Quoc Nguyen, Tu Dinh Nguyen, Dat Quoc Nguyen and Dinh Phung. Proc. of the 29th ACM Int. Conf. on Information and Knowledge Management (CIKM), abs/1911.04822, 2020. (Our code is available at: \url{https://github.com/daiquocnguyen/Caps2NE}). [ | | pdf]
    In this paper, we focus on learning low-dimensional embeddings for nodes in graph-structured data. To achieve this, we propose Caps2NE -- a new unsupervised embedding model leveraging a network of two capsule layers. Caps2NE induces a routing process to aggregate feature vectors of context neighbors of a given target node at the first capsule layer, then feed these features into the second capsule layer to infer a plausible embedding for the target node. Experimental results show that our proposed Caps2NE obtains state-of-the-art performances on benchmark datasets for the node classification task.
    @ARTICLE { nguyen_etal_cikm20_capsule,
        AUTHOR = { Dai Quoc Nguyen and Tu Dinh Nguyen and Dat Quoc Nguyen and Dinh Phung },
        JOURNAL = { Proc. of the 29th ACM Int. Conf. on Information and Knowledge Management (CIKM) },
        TITLE = { A Capsule Network-based Model for Learning Node Embeddings },
        YEAR = { 2020 },
        NOTE = { Our code is available at: \url{https://github.com/daiquocnguyen/Caps2NE} },
        VOLUME = { abs/1911.04822 },
        ABSTRACT = { In this paper, we focus on learning low-dimensional embeddings for nodes in graph-structured data. To achieve this, we propose Caps2NE -- a new unsupervised embedding model leveraging a network of two capsule layers. Caps2NE induces a routing process to aggregate feature vectors of context neighbors of a given target node at the first capsule layer, then feed these features into the second capsule layer to infer a plausible embedding for the target node. Experimental results show that our proposed Caps2NE obtains state-of-the-art performances on benchmark datasets for the node classification task. },
        FILE = { :nguyen_etal_cikm20_capsule - A Capsule Network Based Model for Learning Node Embeddings.pdf:PDF },
        URL = { https://arxiv.org/abs/1911.04822 },
    }
J
  • A Self-Attention Network based Node Embedding Model
    Dai Quoc Nguyen, Tu Dinh Nguyen and Dinh Phung. In Proc. of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD), 2020. [ | ]
    Despite several progresses have been made recently, limited research has been conducted for inductive setting where embeddings are required for newly unseen nodes – a setting encountered commonly in practical applications of deep learning for graph networks. This significantly affects the performances of downstream tasks such as node classification, link prediction or community extraction. To this end, we propose SANNE – a novel unsupervised embedding model – whose central idea is to employ a self-attention mechanism followed by a feed-forward network, in order to iteratively aggregate vector representations of nodes in sampled random walks. As a consequence, SANNE can produce plausible embeddings not only for present nodes, but also for newly unseen nodes. Experimental results show that our unsupervised SANNE obtains state-of-the-art results for the node classification task on benchmark datasets.
    @INPROCEEDINGS { nguyen_etal_ecml20_selfattention,
        AUTHOR = { Dai Quoc Nguyen and Tu Dinh Nguyen and Dinh Phung },
        BOOKTITLE = { Proc. of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD) },
        TITLE = { A Self-Attention Network based Node Embedding Model },
        YEAR = { 2020 },
        ABSTRACT = { Despite several progresses have been made recently, limited research has been conducted for inductive setting where embeddings are required for newly unseen nodes – a setting encountered commonly in practical applications of deep learning for graph networks. This significantly affects the performances of downstream tasks such as node classification, link prediction or community extraction. To this end, we propose SANNE – a novel unsupervised embedding model – whose central idea is to employ a self-attention mechanism followed by a feed-forward network, in order to iteratively aggregate vector representations of nodes in sampled random walks. As a consequence, SANNE can produce plausible embeddings not only for present nodes, but also for newly unseen nodes. Experimental results show that our unsupervised SANNE obtains state-of-the-art results for the node classification task on benchmark datasets. },
    }
C
  • Parameterized Rate-Distortion Stochastic Encoder
    Quan Hoang, Trung Le and Dinh Phung. In Proc. of the 37th International Conference on Machine Learning (ICML), 2020. [ | ]
    @INPROCEEDINGS { hoang_etal_icml20_parameterized,
        AUTHOR = { Quan Hoang and Trung Le and Dinh Phung },
        BOOKTITLE = { Proc. of the 37th International Conference on Machine Learning (ICML) },
        TITLE = { Parameterized Rate-Distortion Stochastic Encoder },
        YEAR = { 2020 },
    }
C
  • Deep Generative Models of Sparse and Overdispersed Discrete Data
    He Zhao, Piyush Rai, Lan Du, Wray Buntine, Dinh Phung and Mingyuan Zhou. In Proc of the 23rd Int. Conf. on Artificial Intelligence and Statistics (AISTATS), 2020. [ | | pdf]
    In this paper, we propose a variational autoencoder based framework that generates discrete data, including both count-valued and binary data, via negativebinomial distribution. We also examine the model’s ability to capture self- and cross-excitations in discrete data, which are critical for modelling overdispersion. We conduct extensive experiments on text analysis and collaborative filtering. Compared with several state-of-the-art baselines, the proposed models achieve significantly better performance on the above problems. By achieving superior modelling performance with a simple yet effect Bayesian extension to VAEs, we demonstrate that it is feasible to adapt the knowledge and experience of Bayesian probabilistic matrix factorisation into newly-developed deep generative models.
    @INPROCEEDINGS { zhao_etal_aistats20_deepgenerative,
        AUTHOR = { He Zhao and Piyush Rai and Lan Du and Wray Buntine and Dinh Phung and Mingyuan Zhou },
        TITLE = { Deep Generative Models of Sparse and Overdispersed Discrete Data },
        BOOKTITLE = { Proc of the 23rd Int. Conf. on Artificial Intelligence and Statistics (AISTATS) },
        YEAR = { 2020 },
        ABSTRACT = { In this paper, we propose a variational autoencoder based framework that generates discrete data, including both count-valued and binary data, via negativebinomial distribution. We also examine the model’s ability to capture self- and cross-excitations in discrete data, which are critical for modelling overdispersion. We conduct extensive experiments on text analysis and collaborative filtering. Compared with several state-of-the-art baselines, the proposed models achieve significantly better performance on the above problems. By achieving superior modelling performance with a simple yet effect Bayesian extension to VAEs, we demonstrate that it is feasible to adapt the knowledge and experience of Bayesian probabilistic matrix factorisation into newly-developed deep generative models. },
        FILE = { :zhao_etal_aistats20_deepgenerative - Deep Generative Models of Sparse and Overdispersed Discrete Data.pdf:PDF },
        URL = { https://www.semanticscholar.org/paper/Deep-Generative-Models-of-Sparse-and-Overdispersed-Zhao-Rai/8136c46488875b09e15e89c08bf02698901322a1 },
    }
C
  • A Relational Memory-based Embedding Model for Triple Classification and Search Personalization
    Dai Quoc Nguyen, Tu Dinh Nguyen and Dinh Phung. In Proc. of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), 2020. [ | | pdf]
    Knowledge graph embedding methods often suffer from a limitation of memorizing valid triples to predict new ones for triple classification and search personalization problems. To this end, we introduce a novel embedding model, named R-MeN, that explores a relational memory network to encode potential dependencies in relationship triples. R-MeN considers each triple as a sequence of 3 input vectors that recurrently interact with a memory using a transformer self-attention mechanism. Thus R-MeN encodes new information from interactions between the memory and each input vector to return a corresponding vector. Consequently, R-MeN feeds these 3 returned vectors to a convolutional neural network-based decoder to produce a scalar score for the triple. Experimental results show that our proposed R-MeN obtains state-of-the-art results on SEARCH17 for the search personalization task, and on WN11 and FB13 for the triple classification task.
    @INPROCEEDINGS { nguyen_etal_acl9_relational,
        AUTHOR = { Dai Quoc Nguyen and Tu Dinh Nguyen and Dinh Phung },
        BOOKTITLE = { Proc. of the 58th Annual Meeting of the Association for Computational Linguistics (ACL) },
        TITLE = { A Relational Memory-based Embedding Model for Triple Classification and Search Personalization },
        YEAR = { 2020 },
        ABSTRACT = { Knowledge graph embedding methods often suffer from a limitation of memorizing valid triples to predict new ones for triple classification and search personalization problems. To this end, we introduce a novel embedding model, named R-MeN, that explores a relational memory network to encode potential dependencies in relationship triples. R-MeN considers each triple as a sequence of 3 input vectors that recurrently interact with a memory using a transformer self-attention mechanism. Thus R-MeN encodes new information from interactions between the memory and each input vector to return a corresponding vector. Consequently, R-MeN feeds these 3 returned vectors to a convolutional neural network-based decoder to produce a scalar score for the triple. Experimental results show that our proposed R-MeN obtains state-of-the-art results on SEARCH17 for the search personalization task, and on WN11 and FB13 for the triple classification task. },
        FILE = { :nguyen_etal_acl9_relational - A Relational Memory Based Embedding Model for Triple Classification and Search Personalization.PDF:PDF },
        URL = { https://arxiv.org/abs/1907.06080 },
    }
C
  • Stein variational gradient descent with variance reduction
    Nhan Dam, Trung Le, Viet Huynh and Dinh Phung. In Proc. of the 2020 Int. Joint Conference on Neural Networks (IJCNN), jul 2020. [ | ]
    Probabilistic inference is a common and important task in statistical machine learning. The recently proposed Stein variational gradient descent (SVGD) is a generic Bayesian inference method that has been shown to be successfully applied in a wide range of contexts, especially in dealing with large datasets, where existing probabilistic inference methods have been known to be ineffective. In a large-scale data setting, SVGD employs the mini-batch strategy but its mini-batch estimator has large variance, hence compromising its estimation quality in practice. To this end, we propose in this paper a generic SVGD-based inference method that can significantly reduce the variance of mini-batch estimator when working with large datasets. Our experiments on 14 datasets show that the proposed method enjoys substantial and consistent improvements compared with baseline methods in binary classification task and its pseudo-online learning setting, and regression task. Furthermore, our framework is generic and applicable to a wide range of probabilistic inference problems such as in Bayesian neural networks and Markov random fields.
    @INPROCEEDINGS { dam_etal_ijcnn20_steinvariational,
        AUTHOR = { Nhan Dam and Trung Le and Viet Huynh and Dinh Phung },
        BOOKTITLE = { Proc. of the 2020 Int. Joint Conference on Neural Networks (IJCNN) },
        TITLE = { Stein variational gradient descent with variance reduction },
        YEAR = { 2020 },
        MONTH = { jul },
        ABSTRACT = { Probabilistic inference is a common and important task in statistical machine learning. The recently proposed Stein variational gradient descent (SVGD) is a generic Bayesian inference method that has been shown to be successfully applied in a wide range of contexts, especially in dealing with large datasets, where existing probabilistic inference methods have been known to be ineffective. In a large-scale data setting, SVGD employs the mini-batch strategy but its mini-batch estimator has large variance, hence compromising its estimation quality in practice. To this end, we propose in this paper a generic SVGD-based inference method that can significantly reduce the variance of mini-batch estimator when working with large datasets. Our experiments on 14 datasets show that the proposed method enjoys substantial and consistent improvements compared with baseline methods in binary classification task and its pseudo-online learning setting, and regression task. Furthermore, our framework is generic and applicable to a wide range of probabilistic inference problems such as in Bayesian neural networks and Markov random fields. },
        FILE = { :dam_etal_ijcnn20_steinvariational - Stein Variational Gradient Descent with Variance Reduction.pdf:PDF },
    }
C
  • OptiGAN: Generative Adversarial Networks for Goal Optimized Sequence Generation
    Mahmoud Hossam, Trung Le, Viet Huynh, Michael Papasimeon and Dinh Phung. In Proc. of the International Joint Conference on Neural Networks (IJCNN), 2020. [ | ]
    One of the challenging problems in sequence generation tasks is the optimized generation of sequences with specific desired goals. Existing sequential generative models mainly generate sequences to closely mimic the training data, without direct optimization according to desired goals or properties specific to the task. In this paper, we propose OptiGAN, a generative GAN-based model that incorporates both Generative Adversarial Networks and Reinforcement Learning (RL) to optimize desired goal scores using policy gradients. We apply our model to text and sequence generation, where our model is able to achieve higher scores out-performing selected GAN and RL baselines, while not sacrificing output sample diversity.
    @INPROCEEDINGS { hossam_etal_ijcnn20_OptiGAN,
        AUTHOR = { Mahmoud Hossam and Trung Le and Viet Huynh and Michael Papasimeon and Dinh Phung },
        BOOKTITLE = { Proc. of the International Joint Conference on Neural Networks (IJCNN) },
        TITLE = { OptiGAN: Generative Adversarial Networks for Goal Optimized Sequence Generation },
        YEAR = { 2020 },
        ABSTRACT = { One of the challenging problems in sequence generation tasks is the optimized generation of sequences with specific desired goals. Existing sequential generative models mainly generate sequences to closely mimic the training data, without direct optimization according to desired goals or properties specific to the task. In this paper, we propose OptiGAN, a generative GAN-based model that incorporates both Generative Adversarial Networks and Reinforcement Learning (RL) to optimize desired goal scores using policy gradients. We apply our model to text and sequence generation, where our model is able to achieve higher scores out-performing selected GAN and RL baselines, while not sacrificing output sample diversity. },
        FILE = { :hossam_etal_ijcnn20_OptiGAN - OptiGAN_ Generative Adversarial Networks for Goal Optimized Sequence Generation.pdf:PDF },
    }
C
  • Code Pointer Network for Binary Function Scope Identification
    Van Nguyen, Trung Le, Tue Le, Khanh Nguyen, Olivier de Vel, Paul Montague and Dinh Phung. In Proc. of the International Joint Conference on Neural Networks (IJCNN), 2020. [ | ]
    Function identification is a preliminary step in binary analysis for many extensive applications from malware detection, common vulnerability detection and binary instrumentation to name a few. In this paper, we propose the Code Pointer Network that leverages the underlying idea of a pointer network to efficiently and effectively tackle function scope identification – the hardest and most crucial task in function identification. We establish extensive experiments to compare our proposed method with the deep learning based baseline. Experimental results demonstrate that our proposed method significantly outperforms the state-of-the-art baseline in terms of both predictive performance and running time.
    @INPROCEEDINGS { nguyen_etal_ijcnn20_codepointer,
        AUTHOR = { Van Nguyen and Trung Le and Tue Le and Khanh Nguyen and Olivier de Vel and Paul Montague and Dinh Phung },
        BOOKTITLE = { Proc. of the International Joint Conference on Neural Networks (IJCNN) },
        TITLE = { Code Pointer Network for Binary Function Scope Identification },
        YEAR = { 2020 },
        ABSTRACT = { Function identification is a preliminary step in binary analysis for many extensive applications from malware detection, common vulnerability detection and binary instrumentation to name a few. In this paper, we propose the Code Pointer Network that leverages the underlying idea of a pointer network to efficiently and effectively tackle function scope identification – the hardest and most crucial task in function identification. We establish extensive experiments to compare our proposed method with the deep learning based baseline. Experimental results demonstrate that our proposed method significantly outperforms the state-of-the-art baseline in terms of both predictive performance and running time. },
        FILE = { :nguyen_etal_ijcnn20_codepointer - Code Pointer Network for Binary Function Scope Identification.pdf:PDF },
    }
C
  • Dual-Component Deep Domain Adaptation: A New Approach for Cross Project Software Vulnerability Detection
    Van Nguyen, Trung Le, Olivier de Vel, Paul Montague, John Grundy and Dinh Phung. In Proc. of the 24th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2020. [ | ]
    Owing to the ubiquity of computer software, software vulnerability detection (SVD) has become an important problem in the software industry and computer security. One of the most crucial issues in SVD is coping with the scarcity of labeled vulnerabilities in projects that require the laborious manual labeling of code by software security experts. One possible solution is to employ deep domain adaptation (DA) which has recently witnessed enormous success in transferring learning from structural labeled to unlabeled data sources. Generative adversarial network (GAN) is a technique that attempts to bridge the gap between source and target data in the joint space and emerges as a building block to develop deep DA approaches with state-of-the-art performance. However, deep DA approaches using the GAN principle to close the gap are subject to the mode collapsing problem that negatively impacts the predictive performance. Our aim in this paper is to propose Dual Generator-Discriminator Deep Code Domain Adaptation Network (Dual-GD-DDAN) for tackling the problem of transfer learning from labeled to unlabeled software projects in SVD to resolve the mode collapsing problem faced in previous approaches. The experimental results on real-world software projects show that our method outperforms state-of-the art baselines by a wide margin.
    @INPROCEEDINGS { nguyen_etal_pakdd20_dualcomponent,
        AUTHOR = { Van Nguyen and Trung Le and Olivier de Vel and Paul Montague and John Grundy and Dinh Phung },
        BOOKTITLE = { Proc. of the 24th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) },
        TITLE = { Dual-Component Deep Domain Adaptation: A New Approach for Cross Project Software Vulnerability Detection },
        YEAR = { 2020 },
        ABSTRACT = { Owing to the ubiquity of computer software, software vulnerability detection (SVD) has become an important problem in the software industry and computer security. One of the most crucial issues in SVD is coping with the scarcity of labeled vulnerabilities in projects that require the laborious manual labeling of code by software security experts. One possible solution is to employ deep domain adaptation (DA) which has recently witnessed enormous success in transferring learning from structural labeled to unlabeled data sources. Generative adversarial network (GAN) is a technique that attempts to bridge the gap between source and target data in the joint space and emerges as a building block to develop deep DA approaches with state-of-the-art performance. However, deep DA approaches using the GAN principle to close the gap are subject to the mode collapsing problem that negatively impacts the predictive performance. Our aim in this paper is to propose Dual Generator-Discriminator Deep Code Domain Adaptation Network (Dual-GD-DDAN) for tackling the problem of transfer learning from labeled to unlabeled software projects in SVD to resolve the mode collapsing problem faced in previous approaches. The experimental results on real-world software projects show that our method outperforms state-of-the art baselines by a wide margin. },
        FILE = { :nguyen_etal_pakdd20_dualcomponent - Dual Component Deep Domain Adaptation_ a New Approach for Cross Project Software Vulnerability Detection.pdf:PDF },
    }
C
  • Code Action Network for Binary Function Scope Identification
    Van Nguyen, Trung Le, Tue Le, Khanh Nguyen, Olivier de Vel, Paul Montague, John Grundy and Dinh Phung. In Proc. of the 24th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2020. [ | ]
    Function identification is a preliminary step in binary analysis for many applications from malware detection, common vulnerability detection and binary instrumentation to name a few. In this paper, we propose the Code Action Network (CAN) whose key idea is to encode the task of function scope identification to a sequence of three action states NI (i.e., next inclusion), NE (i.e., next exclusion), and FE (i.e., function end) to efficiently and effectively tackle function scope identification, the hardest and most crucial task in function identification. A bidirectional Recurrent Neural Network is trained to match binary programs with their sequence of action states. To work out function scopes in a binary, this binary is first fed to a trained CAN to output its sequence of action states which can be further decoded to know the function scopes in the binary. We undertake extensive experiments to compare our proposed method with other stateof-the-art baselines. Experimental results demonstrate that our proposed method outperforms the state-of-the-art baselines in terms of predictive performance on real-world datasets which include binaries from well-known libraries.
    @INPROCEEDINGS { nguyen_etal_pakdd20_codeaction,
        AUTHOR = { Van Nguyen and Trung Le and Tue Le and Khanh Nguyen and Olivier de Vel and Paul Montague and John Grundy and Dinh Phung },
        BOOKTITLE = { Proc. of the 24th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) },
        TITLE = { Code Action Network for Binary Function Scope Identification },
        YEAR = { 2020 },
        ABSTRACT = { Function identification is a preliminary step in binary analysis for many applications from malware detection, common vulnerability detection and binary instrumentation to name a few. In this paper, we propose the Code Action Network (CAN) whose key idea is to encode the task of function scope identification to a sequence of three action states NI (i.e., next inclusion), NE (i.e., next exclusion), and FE (i.e., function end) to efficiently and effectively tackle function scope identification, the hardest and most crucial task in function identification. A bidirectional Recurrent Neural Network is trained to match binary programs with their sequence of action states. To work out function scopes in a binary, this binary is first fed to a trained CAN to output its sequence of action states which can be further decoded to know the function scopes in the binary. We undertake extensive experiments to compare our proposed method with other stateof-the-art baselines. Experimental results demonstrate that our proposed method outperforms the state-of-the-art baselines in terms of predictive performance on real-world datasets which include binaries from well-known libraries. },
        FILE = { :nguyen_etal_pakdd20_codeaction - Code Action Network for Binary Function Scope Identification.pdf:PDF },
    }
C
  • Deep Cost-sensitive Kernel Machine for Binary Software Vulnerability Detection
    Tuan Nguyen, Trung Le, Khanh Nguyen, Olivier de Vel, Paul Montague, John C Grundy and and Dinh Phung. In Proc. of the 24th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2020. [ | ]
    Owing to the sharp rise in the severity of the threats imposed by software vulnerabilities, software vulnerability detection has become an important concern in the software industry, such as the embedded systems industry, and in the field of computer security. Software vulnerability detection can be carried out at the source code or binary level. However, the latter is more impactful and practical since when using commercial software, we usually only possess binary software. In this paper, we leverage deep learning and kernel methods to propose the Deep Cost-sensitive Kernel Machine, a method that inherits the advantages of deep learning methods in efficiently tackling structural data and kernel methods in learning the characteristic of vulnerable binary examples with high generalization capacity. We conduct experiments on two real-world binary datasets. The experimental results have shown a convincing outperformance of our proposed method over the baselines.
    @INPROCEEDINGS { nguyen_etal_pakdd20_deepcost,
        AUTHOR = { Tuan Nguyen and Trung Le and Khanh Nguyen and Olivier de Vel and Paul Montague and John C Grundy and and Dinh Phung },
        BOOKTITLE = { Proc. of the 24th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) },
        TITLE = { Deep Cost-sensitive Kernel Machine for Binary Software Vulnerability Detection },
        YEAR = { 2020 },
        ABSTRACT = { Owing to the sharp rise in the severity of the threats imposed by software vulnerabilities, software vulnerability detection has become an important concern in the software industry, such as the embedded systems industry, and in the field of computer security. Software vulnerability detection can be carried out at the source code or binary level. However, the latter is more impactful and practical since when using commercial software, we usually only possess binary software. In this paper, we leverage deep learning and kernel methods to propose the Deep Cost-sensitive Kernel Machine, a method that inherits the advantages of deep learning methods in efficiently tackling structural data and kernel methods in learning the characteristic of vulnerable binary examples with high generalization capacity. We conduct experiments on two real-world binary datasets. The experimental results have shown a convincing outperformance of our proposed method over the baselines. },
        FILE = { :nguyen_etal_pakdd20_deepcost - Deep Cost Sensitive Kernel Machine for Binary Software Vulnerability Detection.pdf:PDF },
    }
C
  • Effective Unsupervised Domain Adaptation with Adversarially Trained Language Models
    Vu Thuy-Trang , Phung Dinh and Haffari Gholamreza. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6163-6173, Online, nov 2020. [ | | pdf]
    Recent work has shown the importance of adaptation of broad-coverage contextualised embedding models on the domain of the target task of interest. Current self-supervised adaptation methods are simplistic, as the training signal comes from a small percentage of \textit{randomly} masked-out tokens. In this paper, we show that careful masking strategies can bridge the knowledge gap of masked language models (MLMs) about the domains more effectively by allocating self-supervision where it is needed. Furthermore, we propose an effective training strategy by adversarially masking out those tokens which are harder to reconstruct by the underlying MLM. The adversarial objective leads to a challenging combinatorial optimisation problem over \textit{subsets} of tokens, which we tackle efficiently through relaxation to a variational lowerbound and dynamic programming. On six unsupervised domain adaptation tasks involving named entity recognition, our method strongly outperforms the random masking strategy and achieves up to +1.64 F1 score improvements.
    @INPROCEEDINGS { vu_etal_emnlp20_effective,
        TITLE = { Effective Unsupervised Domain Adaptation with Adversarially Trained Language Models },
        AUTHOR = { Vu Thuy-Trang and Phung Dinh and Haffari Gholamreza },
        BOOKTITLE = { Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) },
        MONTH = { nov },
        YEAR = { 2020 },
        ADDRESS = { Online },
        PUBLISHER = { Association for Computational Linguistics },
        URL = { https://aclanthology.org/2020.emnlp-main.497 },
        DOI = { 10.18653/v1/2020.emnlp-main.497 },
        PAGES = { 6163--6173 },
        ABSTRACT = { Recent work has shown the importance of adaptation of broad-coverage contextualised embedding models on the domain of the target task of interest. Current self-supervised adaptation methods are simplistic, as the training signal comes from a small percentage of \textit{randomly} masked-out tokens. In this paper, we show that careful masking strategies can bridge the knowledge gap of masked language models (MLMs) about the domains more effectively by allocating self-supervision where it is needed. Furthermore, we propose an effective training strategy by adversarially masking out those tokens which are harder to reconstruct by the underlying MLM. The adversarial objective leads to a challenging combinatorial optimisation problem over \textit{subsets} of tokens, which we tackle efficiently through relaxation to a variational lowerbound and dynamic programming. On six unsupervised domain adaptation tasks involving named entity recognition, our method strongly outperforms the random masking strategy and achieves up to +1.64 F1 score improvements. },
    }
C
  • Improving Adversarial Robustness by Enforcing Local and Global Compactness
    Anh Bui, Trung Le, He Zhao, P. Montague, O. de Vel, T. Abraham and Dinh Phung. In Proc. of the European Conference on Computer Vision (ECCV), 2020. [ | ]
    The fact that deep neural networks are susceptible to crafted perturbations severely impacts the use of deep learning in certain domains of application. Among many developed defense models against such attacks, adversarial training emerges as the most successful method that consistently resists a wide range of attacks. In this work, based on an observation from a previous study that the representations of a clean data example and its adversarial examples become more divergent in higher layers of a deep neural net, we propose the Adversary Divergence Reduction Network which enforces local/global compactness and the clustering assumption over an intermediate layer of a deep neural network. We conduct comprehensive experiments to understand the isolating behavior of each component (i.e., local/global compactness and the clustering assumption) and compare our proposed model with state-of-the-art adversarial training methods. The experimental results demonstrate that augmenting adversarial training with our proposed components can further improve the robustness of the network, leading to higher unperturbed and adversarial predictive performances.
    @INPROCEEDINGS { bui_etal_eccv20_improving,
        AUTHOR = { Anh Bui and Trung Le and He Zhao and P. Montague and O. de Vel and T. Abraham and Dinh Phung },
        BOOKTITLE = { Proc. of the European Conference on Computer Vision (ECCV) },
        TITLE = { Improving Adversarial Robustness by Enforcing Local and Global Compactness },
        YEAR = { 2020 },
        ABSTRACT = { The fact that deep neural networks are susceptible to crafted perturbations severely impacts the use of deep learning in certain domains of application. Among many developed defense models against such attacks, adversarial training emerges as the most successful method that consistently resists a wide range of attacks. In this work, based on an observation from a previous study that the representations of a clean data example and its adversarial examples become more divergent in higher layers of a deep neural net, we propose the Adversary Divergence Reduction Network which enforces local/global compactness and the clustering assumption over an intermediate layer of a deep neural network. We conduct comprehensive experiments to understand the isolating behavior of each component (i.e., local/global compactness and the clustering assumption) and compare our proposed model with state-of-the-art adversarial training methods. The experimental results demonstrate that augmenting adversarial training with our proposed components can further improve the robustness of the network, leading to higher unperturbed and adversarial predictive performances. },
    }
C
  • Using spatiotemporal distribution of geocoded Twitter data to predict US county-level health indices
    Thin Nguyen, Mark Larsen, Bridianne O’Dea, Hung Nguyen, Duc Thanh Nguyen, John Yearwood, Dinh Phung, Svetha Venkatesh and Helen Christensen. Future Generation Computer Systems, 110:620-628, 2020. [ | | pdf]
    For more than three decades, the US has annually conducted Behavioral Risk Factor Surveillance System (BRFSS) surveys to capture health behavior and health status of its people. Though this kind of information at population level is important for local governments to identify local needs, traditional datasets take several years to collate and to become publicly available. Geocoded social media data can provide an alternative reflection of local health trends. Due to the large scale of data, such as approximately two billions of tweets in this work, aggregating the tweets at a population level is common practice. While alleviating the computational cost, the aggregation operation would result in the loss of information on the distribution of data over the population, and such information may be important for identifying the health behavior and health outcomes of the population. In this work, we propose statistical features constructed on-top of primary features to predict county-level health indices. The primary features include topics and linguistic patterns extracted from tweets with county-decoded information. In addition, tweeting behaviors, particularly tweeting time, are used as a predictor of the health indices. Apache Spark, an advanced cluster computing paradigm, was employed to efficiently process the large corpus of tweets, including geo-decoding the geotags, extracting low-level (primary) features, and computing the statistical features. The results show strong correlations between publicly available health indices and the features extracted from geospatially coded Twitter data. Statistical features gained higher correlation coefficients than did the aggregation ones, suggesting the potential and applicability of the proposed features in a wide spectrum of applications on data analytics at population levels. In addition, the prediction performance was also improved when the temporal information was employed. This demonstrates that the real-time analysis of social media data can provide timely insights into the health of populations.
    @ARTICLE { thin_etal_fgcs20_using_spatiotemporal,
        TITLE = { Using spatiotemporal distribution of geocoded Twitter data to predict US county-level health indices },
        JOURNAL = { Future Generation Computer Systems },
        VOLUME = { 110 },
        PAGES = { 620-628 },
        YEAR = { 2020 },
        ISSN = { 0167-739X },
        DOI = { https://doi.org/10.1016/j.future.2018.01.014 },
        URL = { https://www.sciencedirect.com/science/article/pii/S0167739X17312487 },
        AUTHOR = { Thin Nguyen and Mark Larsen and Bridianne O’Dea and Hung Nguyen and Duc Thanh Nguyen and John Yearwood and Dinh Phung and Svetha Venkatesh and Helen Christensen },
        KEYWORDS = { Mining spatial and temporal data, Statistical features, Spatio-temporal features, Cluster computing, Large-scale parallel and distributed implementation, Apache Spark },
        ABSTRACT = { For more than three decades, the US has annually conducted Behavioral Risk Factor Surveillance System (BRFSS) surveys to capture health behavior and health status of its people. Though this kind of information at population level is important for local governments to identify local needs, traditional datasets take several years to collate and to become publicly available. Geocoded social media data can provide an alternative reflection of local health trends. Due to the large scale of data, such as approximately two billions of tweets in this work, aggregating the tweets at a population level is common practice. While alleviating the computational cost, the aggregation operation would result in the loss of information on the distribution of data over the population, and such information may be important for identifying the health behavior and health outcomes of the population. In this work, we propose statistical features constructed on-top of primary features to predict county-level health indices. The primary features include topics and linguistic patterns extracted from tweets with county-decoded information. In addition, tweeting behaviors, particularly tweeting time, are used as a predictor of the health indices. Apache Spark, an advanced cluster computing paradigm, was employed to efficiently process the large corpus of tweets, including geo-decoding the geotags, extracting low-level (primary) features, and computing the statistical features. The results show strong correlations between publicly available health indices and the features extracted from geospatially coded Twitter data. Statistical features gained higher correlation coefficients than did the aggregation ones, suggesting the potential and applicability of the proposed features in a wide spectrum of applications on data analytics at population levels. In addition, the prediction performance was also improved when the temporal information was employed. This demonstrates that the real-time analysis of social media data can provide timely insights into the health of populations. },
    }
J
  • Variational Autoencoders for Sparse and Overdispersed Discrete Data
    Zhao, He, Rai, Piyush, Du, Lan, Buntine, Wray, Phung, Dinh and Zhou, Mingyuan. In Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, pages 1684-1694, 26--28 Aug 2020. [ | | pdf]
    Many applications, such as text modelling, high-throughput sequencing, and recommender systems, require analysing sparse, high-dimensional, and overdispersed discrete (count or binary) data. Recent deep probabilistic models based on variational autoencoders (VAE) have shown promising results on discrete data but may have inferior modelling performance due to the insufficient capability in modelling overdispersion and model misspecification. To address these issues, we develop a VAE-based framework using the negative binomial distribution as the data distribution. We also provide an analysis of its properties vis-à-vis other models. We conduct extensive experiments on three problems from discrete data analysis: text analysis/topic modelling, collaborative filtering, and multi-label learning. Our models outperform state-of-the-art approaches on these problems, while also capturing the phenomenon of overdispersion more effectively.
    @INPROCEEDINGS { zhao_etal_pmlr20_variational_autoencoders,
        TITLE = { Variational Autoencoders for Sparse and Overdispersed Discrete Data },
        AUTHOR = { Zhao, He and Rai, Piyush and Du, Lan and Buntine, Wray and Phung, Dinh and Zhou, Mingyuan },
        BOOKTITLE = { Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics },
        PAGES = { 1684--1694 },
        YEAR = { 2020 },
        EDITOR = { Chiappa, Silvia and Calandra, Roberto },
        VOLUME = { 108 },
        SERIES = { Proceedings of Machine Learning Research },
        MONTH = { 26--28 Aug },
        PUBLISHER = { PMLR },
        PDF = { http://proceedings.mlr.press/v108/zhao20c/zhao20c.pdf },
        URL = { https://proceedings.mlr.press/v108/zhao20c.html },
        ABSTRACT = { Many applications, such as text modelling, high-throughput sequencing, and recommender systems, require analysing sparse, high-dimensional, and overdispersed discrete (count or binary) data. Recent deep probabilistic models based on variational autoencoders (VAE) have shown promising results on discrete data but may have inferior modelling performance due to the insufficient capability in modelling overdispersion and model misspecification. To address these issues, we develop a VAE-based framework using the negative binomial distribution as the data distribution. We also provide an analysis of its properties vis-à-vis other models. We conduct extensive experiments on three problems from discrete data analysis: text analysis/topic modelling, collaborative filtering, and multi-label learning. Our models outperform state-of-the-art approaches on these problems, while also capturing the phenomenon of overdispersion more effectively. },
    }
C
  • Pair-Based Uncertainty and Diversity Promoting Early Active Learning for Person Re-Identification
    Liu, Wenhe, Chang, Xiaojun, Chen, Ling, Phung, Dinh, Zhang, Xiaoqin, Yang, Yi and Hauptmann, Alexander G.. ACM Trans. Intell. Syst. Technol., 11(2), jan 2020. [ | | pdf]
    The effective training of supervised Person Re-identification (Re-ID) models requires sufficient pairwise labeled data. However, when there is limited annotation resource, it is difficult to collect pairwise labeled data. We consider a challenging and practical problem called Early Active Learning, which is applied to the early stage of experiments when there is no pre-labeled sample available as references for human annotating. Previous early active learning methods suffer from two limitations for Re-ID. First, these instance-based algorithms select instances rather than pairs, which can result in missing optimal pairs for Re-ID. Second, most of these methods only consider the representativeness of instances, which can result in selecting less diverse and less informative pairs. To overcome these limitations, we propose a novel pair-based active learning for Re-ID. Our algorithm selects pairs instead of instances from the entire dataset for annotation. Besides representativeness, we further take into account the uncertainty and the diversity in terms of pairwise relations. Therefore, our algorithm can produce the most representative, informative, and diverse pairs for Re-ID data annotation. Extensive experimental results on five benchmark Re-ID datasets have demonstrated the superiority of the proposed pair-based early active learning algorithm.
    @ARTICLE { liu_etal_acmTIST20_pair_based_uncertainty,
        AUTHOR = { Liu, Wenhe and Chang, Xiaojun and Chen, Ling and Phung, Dinh and Zhang, Xiaoqin and Yang, Yi and Hauptmann, Alexander G. },
        TITLE = { Pair-Based Uncertainty and Diversity Promoting Early Active Learning for Person Re-Identification },
        YEAR = { 2020 },
        ISSUE_DATE = { April 2020 },
        PUBLISHER = { Association for Computing Machinery },
        ADDRESS = { New York, NY, USA },
        VOLUME = { 11 },
        NUMBER = { 2 },
        ISSN = { 2157-6904 },
        URL = { https://doi.org/10.1145/3372121 },
        DOI = { 10.1145/3372121 },
        ABSTRACT = { The effective training of supervised Person Re-identification (Re-ID) models requires sufficient pairwise labeled data. However, when there is limited annotation resource, it is difficult to collect pairwise labeled data. We consider a challenging and practical problem called Early Active Learning, which is applied to the early stage of experiments when there is no pre-labeled sample available as references for human annotating. Previous early active learning methods suffer from two limitations for Re-ID. First, these instance-based algorithms select instances rather than pairs, which can result in missing optimal pairs for Re-ID. Second, most of these methods only consider the representativeness of instances, which can result in selecting less diverse and less informative pairs. To overcome these limitations, we propose a novel pair-based active learning for Re-ID. Our algorithm selects pairs instead of instances from the entire dataset for annotation. Besides representativeness, we further take into account the uncertainty and the diversity in terms of pairwise relations. Therefore, our algorithm can produce the most representative, informative, and diverse pairs for Re-ID data annotation. Extensive experimental results on five benchmark Re-ID datasets have demonstrated the superiority of the proposed pair-based early active learning algorithm. },
        JOURNAL = { ACM Trans. Intell. Syst. Technol. },
        MONTH = { jan },
        ARTICLENO = { 21 },
        NUMPAGES = { 15 },
        KEYWORDS = { Active learning, person re-identification },
    }
J
2019
  • A Bayesian Extension to VAEs for Discrete Data
    He Zhao, Piyush Rai, Lan Du, Wray Buntine, Dinh Phung and Mingyuan Zhou. In In Proc. of Bayesian Deep Learning (NeurIPS 2019 Workshop), dec 2019. [ | ]
    @INPROCEEDINGS { zhao_etal_bdl19_bayesianextension,
        AUTHOR = { He Zhao and Piyush Rai and Lan Du and Wray Buntine and Dinh Phung and Mingyuan Zhou },
        TITLE = { A Bayesian Extension to {VAE}s for Discrete Data },
        BOOKTITLE = { In Proc. of Bayesian Deep Learning (NeurIPS 2019 Workshop) },
        YEAR = { 2019 },
        MONTH = { dec },
    }
C
  • Pair-based Uncertainty and Diversity Promoting Early Active Learning for Person Re-identification
    Wenhe Liu, Xiaojun Chang, Ling Chen, Dinh Phung, Yang Yi and Hauptmann Alexander. Transactions on Intelligent Systems and Technology, 2019. [ | ]
    @ARTICLE { liu_etal_tist19_pairbased,
        AUTHOR = { Wenhe Liu and Xiaojun Chang and Ling Chen and Dinh Phung and Yang Yi and Hauptmann Alexander },
        TITLE = { Pair-based Uncertainty and Diversity Promoting Early Active Learning for Person Re-identification },
        JOURNAL = { Transactions on Intelligent Systems and Technology },
        YEAR = { 2019 },
    }
J
  • An effective spatial-temporal attention based neural network for traffic flow prediction
    Loan N.N. Do, Hai L. Vu, Bao Q. Vo, Zhiyuan Liu and Dinh Phung. Transportation Research Part C: Emerging Technologies, 108:12 - 28, 2019. [ | | pdf]
    Due to its importance in Intelligent Transport Systems (ITS), traffic flow prediction has been the focus of many studies in the last few decades. Existing traffic flow prediction models mainly extract static spatial-temporal correlations, although these correlations are known to be dynamic in traffic networks. Attention-based models have emerged in recent years, mostly in the field of natural language processing, and have resulted in major progresses in terms of both accuracy and interpretability. This inspires us to introduce the application of attentions for traffic flow prediction. In this study, a deep learning based traffic flow predictor with spatial and temporal attentions (STANN) is proposed. The spatial and temporal attentions are used to exploit the spatial dependencies between road segments and temporal dependencies between time steps respectively. Experiment results with a real-world traffic dataset demonstrate the superior performance of the proposed model. The results also show that the utilization of multiple data resolutions could help improve prediction accuracy. Furthermore, the proposed model is demonstrated to have potential for improving the understanding of spatial-temporal correlations in a traffic network.
    @ARTICLE { do_etal_trc19_AnEffective,
        AUTHOR = { Loan N.N. Do and Hai L. Vu and Bao Q. Vo and Zhiyuan Liu and Dinh Phung },
        TITLE = { An effective spatial-temporal attention based neural network for traffic flow prediction },
        JOURNAL = { Transportation Research Part C: Emerging Technologies },
        YEAR = { 2019 },
        VOLUME = { 108 },
        PAGES = { 12 - 28 },
        ISSN = { 0968-090X },
        ABSTRACT = { Due to its importance in Intelligent Transport Systems (ITS), traffic flow prediction has been the focus of many studies in the last few decades. Existing traffic flow prediction models mainly extract static spatial-temporal correlations, although these correlations are known to be dynamic in traffic networks. Attention-based models have emerged in recent years, mostly in the field of natural language processing, and have resulted in major progresses in terms of both accuracy and interpretability. This inspires us to introduce the application of attentions for traffic flow prediction. In this study, a deep learning based traffic flow predictor with spatial and temporal attentions (STANN) is proposed. The spatial and temporal attentions are used to exploit the spatial dependencies between road segments and temporal dependencies between time steps respectively. Experiment results with a real-world traffic dataset demonstrate the superior performance of the proposed model. The results also show that the utilization of multiple data resolutions could help improve prediction accuracy. Furthermore, the proposed model is demonstrated to have potential for improving the understanding of spatial-temporal correlations in a traffic network. },
        DOI = { https://doi.org/10.1016/j.trc.2019.09.008 },
        FILE = { :do_etal_trc19_AnEffective - An Effective Spatial Temporal Attention Based Neural Network for Traffic Flow Prediction.pdf:PDF },
        KEYWORDS = { Traffic flow prediction, Traffic flow forecasting, Deep learning, Neural network, Attention },
        URL = { http://www.sciencedirect.com/science/article/pii/S0968090X19301330 },
    }
J
  • Learning Generative Adversarial Networks from Multiple Data Sources
    Trung Le, Quan Hoang, Hung Vu, Tu Dinh Nguyen, Hung Bui and Dinh Phung. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI), pages 2823-2829, July 2019. [ | | pdf]
    Generative Adversarial Networks (GANs) are a powerful class of deep generative models. In this paper, we extend GAN to the problem of generating data that are not only close to a primary data source but also required to be different from auxiliary data sources. For this problem, we enrich both GANs’ formulations and applications by introducing pushing forces that thrust generated samples away from given auxiliary data sources. We term our method Push-and-Pull GAN (P2GAN). We conduct extensive experiments to demonstratethe merit of P2GAN in two applications: generating data with constraints and addressing the mode collapsing problem. We use CIFAR-10, STL-10, and ImageNet datasets and compute Fréchet Inception Distance to evaluate P2GAN’s effectiveness in addressing the mode collapsing problem. The results show that P2GAN outperforms the state-of-the-art baselines. For the problem of generating data with constraints, we show that P2GAN can successfully avoid generating specific features such as black hair.
    @INPROCEEDINGS { le_etal_ijcai19_learningGAN,
        AUTHOR = { Trung Le and Quan Hoang and Hung Vu and Tu Dinh Nguyen and Hung Bui and Dinh Phung },
        TITLE = { Learning Generative Adversarial Networks from Multiple Data Sources },
        BOOKTITLE = { Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI) },
        YEAR = { 2019 },
        PAGES = { 2823--2829 },
        MONTH = { July },
        PUBLISHER = { International Joint Conferences on Artificial Intelligence Organization },
        ABSTRACT = { Generative Adversarial Networks (GANs) are a powerful class of deep generative models. In this paper, we extend GAN to the problem of generating data that are not only close to a primary data source but also required to be different from auxiliary data sources. For this problem, we enrich both GANs’ formulations and applications by introducing pushing forces that thrust generated samples away from given auxiliary data sources. We term our method Push-and-Pull GAN (P2GAN). We conduct extensive experiments to demonstratethe merit of P2GAN in two applications: generating data with constraints and addressing the mode collapsing problem. We use CIFAR-10, STL-10, and ImageNet datasets and compute Fréchet Inception Distance to evaluate P2GAN’s effectiveness in addressing the mode collapsing problem. The results show that P2GAN outperforms the state-of-the-art baselines. For the problem of generating data with constraints, we show that P2GAN can successfully avoid generating specific features such as black hair. },
        FILE = { :le_etal_ijcai19_learningGAN - Learning Generative Adversarial Networks from Multiple Data Sources.pdf:PDF },
        URL = { https://www.ijcai.org/Proceedings/2019/391 },
    }
C
  • Three-Player Wasserstein GAN via Amortised Duality
    Nhan Dam, Quan Hoang, Trung Le, Tu Dinh Nguyen, Hung Bui and Dinh Phung. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, (IJCAI), pages 2202-2208, July 2019. [ | | pdf]
    We propose a new formulation for learning generative adversarial networks (GANs) using optimal transport cost (the general form of Wasserstein distance) as the objective criterion to measure the dissimilarity between target distribution and learned distribution. Our formulation is based on the general form of the Kantorovich duality which is applicable to optimal transport with a wide range of cost functions that are not necessarily a metric. To make optimising this duality form amenable to gradient-based methods, we employ a function that acts as an amortised optimiser for the innermost optimisation problem. Interestingly, the amortised optimiser can be viewed as a mover since it strategically shifts around data points. The resulting formulation is a sequential min-max-min game with 3 players: the generator, the critic, and the mover where the new player, the mover, attempts to fool the critic by shifting the data around. Despite involving three players, we demonstrate that our proposed formulation can be solved reasonably effectively via a simple alternative gradient learning strategy. Compared with the existing Lipschitz-constrained formulations of Wasserstein GAN on CIFAR-10, our model yields significantly better diversity scores than weight clipping and comparable performance to gradient penalty method.
    @INPROCEEDINGS { dam_etal_ijcai19_3pwgan,
        AUTHOR = { Nhan Dam and Quan Hoang and Trung Le and Tu Dinh Nguyen and Hung Bui and Dinh Phung },
        BOOKTITLE = { Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, (IJCAI) },
        TITLE = { Three-Player {W}asserstein {GAN} via Amortised Duality },
        YEAR = { 2019 },
        MONTH = { July },
        PAGES = { 2202--2208 },
        PUBLISHER = { International Joint Conferences on Artificial Intelligence Organization },
        ABSTRACT = { We propose a new formulation for learning generative adversarial networks (GANs) using optimal transport cost (the general form of Wasserstein distance) as the objective criterion to measure the dissimilarity between target distribution and learned distribution. Our formulation is based on the general form of the Kantorovich duality which is applicable to optimal transport with a wide range of cost functions that are not necessarily a metric. To make optimising this duality form amenable to gradient-based methods, we employ a function that acts as an amortised optimiser for the innermost optimisation problem. Interestingly, the amortised optimiser can be viewed as a mover since it strategically shifts around data points. The resulting formulation is a sequential min-max-min game with 3 players: the generator, the critic, and the mover where the new player, the mover, attempts to fool the critic by shifting the data around. Despite involving three players, we demonstrate that our proposed formulation can be solved reasonably effectively via a simple alternative gradient learning strategy. Compared with the existing Lipschitz-constrained formulations of Wasserstein GAN on CIFAR-10, our model yields significantly better diversity scores than weight clipping and comparable performance to gradient penalty method. },
        FILE = { :dam_etal_ijcai19_3pwgan - Three Player Wasserstein GAN Via Amortised Duality.pdf:PDF },
        URL = { https://www.ijcai.org/Proceedings/2019/305 },
    }
C
  • Learning How to Active Learn by Dreaming
    Thuy-Trang Vu, Ming Liu, Dinh Phung and Gholamreza Haffari. In In Proc. of Annual Meeting of the Association for Computational Linguistics (ACL), Florence, Italy, jul 2019. [ | ]
    @INPROCEEDINGS { vu_etal_acl19_learning,
        AUTHOR = { Thuy-Trang Vu and Ming Liu and Dinh Phung and Gholamreza Haffari },
        TITLE = { Learning How to Active Learn by Dreaming },
        BOOKTITLE = { In Proc. of Annual Meeting of the Association for Computational Linguistics (ACL) },
        YEAR = { 2019 },
        ADDRESS = { Florence, Italy },
        MONTH = { jul },
    }
C
  • Deep Domain Adaptation for Vulnerable Code Function Identification
    Van Nguyen, Trung Le, Tue Le, Khanh Nguyen, Olivier DeVel, Paul Montague, Lizhen Qu and Dinh Phung. In Int. Joint Conf. on Neural Networks (IJCNN), 2019. [ | ]
    Due to the ubiquity of computer software, software vulnerability detection (SVD) has become crucial in the software industry and in the field of computer security. Two significant issues in SVD arise when using machine learning, namely: i) how to learn automatic features that can help improve the predictive performance of vulnerability detection and ii) how to overcome the scarcity of labeled vulnerabilities in projects that require the laborious labeling of code by software security experts. In this paper, we address these two crucial concerns by proposing a novel architecture which leverages deep domain adaptation with automatic feature learning for software vulnerability identification. Based on this architecture, we keep the principles and reapply the state-of-the-art deep domain adaptation methods to indicate that deep domain adaptation for SVD is plausible and promising. Moreover, we further propose a novel method named Semi-supervised Code Domain Adaptation Network (SCDAN) that can efficiently utilize and exploit information carried in unlabeled target data by considering them as the unlabeled portion in a semi-supervised learning context. The proposed SCDAN method enforces the clustering assumption, which is a key principle in semi-supervised learning. The experimental results using six real-world software project datasets show that our SCDAN method and the baselines using our architecture have better predictive performance by a wide margin compared with the Deep Code Network (VulDeePecker) method without domain adaptation. Also, the proposed SCDAN significantly outperforms the DIRT-T which to the best of our knowledge is currently the-state-of-the-art method in deep domain adaptation and other baselines.
    @INPROCEEDINGS { van_etal_ijcnn19_deepdomain,
        AUTHOR = { Van Nguyen and Trung Le and Tue Le and Khanh Nguyen and Olivier DeVel and Paul Montague and Lizhen Qu and Dinh Phung },
        TITLE = { Deep Domain Adaptation for Vulnerable Code Function Identification },
        BOOKTITLE = { Int. Joint Conf. on Neural Networks (IJCNN) },
        YEAR = { 2019 },
        ABSTRACT = { Due to the ubiquity of computer software, software vulnerability detection (SVD) has become crucial in the software industry and in the field of computer security. Two significant issues in SVD arise when using machine learning, namely: i) how to learn automatic features that can help improve the predictive performance of vulnerability detection and ii) how to overcome the scarcity of labeled vulnerabilities in projects that require the laborious labeling of code by software security experts. In this paper, we address these two crucial concerns by proposing a novel architecture which leverages deep domain adaptation with automatic feature learning for software vulnerability identification. Based on this architecture, we keep the principles and reapply the state-of-the-art deep domain adaptation methods to indicate that deep domain adaptation for SVD is plausible and promising. Moreover, we further propose a novel method named Semi-supervised Code Domain Adaptation Network (SCDAN) that can efficiently utilize and exploit information carried in unlabeled target data by considering them as the unlabeled portion in a semi-supervised learning context. The proposed SCDAN method enforces the clustering assumption, which is a key principle in semi-supervised learning. The experimental results using six real-world software project datasets show that our SCDAN method and the baselines using our architecture have better predictive performance by a wide margin compared with the Deep Code Network (VulDeePecker) method without domain adaptation. Also, the proposed SCDAN significantly outperforms the DIRT-T which to the best of our knowledge is currently the-state-of-the-art method in deep domain adaptation and other baselines. },
        FILE = { :van_etal_ijcnn19_deepdomain - Deep Domain Adaptation for Vulnerable Code Function Identification.pdf:PDF },
    }
C
  • A Capsule Network-based Embedding Model for Knowledge Graph Completion and Search Personalization
    Dai Quoc Nguyen, Thanh Vu, Tu Dinh Nguyen, Dat Quoc Nguyen and Dinh Phung. In In Proc. of Annual Conf. of the North American Chapter of the Association for Computational Linguistics (NAACL), Minneapolis, USA, jun 2019. [ | | pdf]
    In this paper, we introduce an embedding model, named CapsE, exploring a capsule network to model relationship triples (subject, relation, object). Our CapsE represents each triple as a 3-column matrix where each column vector represents the embedding of an element in the triple. This 3-column matrix is then fed to a convolution layer where multiple filters are operated to generate different feature maps. These feature maps are used to construct capsules in the first capsule layer. Capsule layers are connected via dynamic routing mechanism. The last capsule layer consists of only one capsule to produce a vector output. The length of this vector output is used to measure the plausibility of the triple. Our proposed CapsE obtains state-of-the-art link prediction results for knowledge graph completion on two benchmark datasets: WN18RR and FB15k-237, and outperforms strong search personalization baselines on SEARCH17 dataset.
    @INPROCEEDINGS { nguyen_etal_naaclhtl19_acapsule,
        AUTHOR = { Dai Quoc Nguyen and Thanh Vu and Tu Dinh Nguyen and Dat Quoc Nguyen and Dinh Phung },
        TITLE = { A Capsule Network-based Embedding Model for Knowledge Graph Completion and Search Personalization },
        BOOKTITLE = { In Proc. of Annual Conf. of the North American Chapter of the Association for Computational Linguistics (NAACL) },
        YEAR = { 2019 },
        ADDRESS = { Minneapolis, USA },
        MONTH = { jun },
        ABSTRACT = { In this paper, we introduce an embedding model, named CapsE, exploring a capsule network to model relationship triples (subject, relation, object). Our CapsE represents each triple as a 3-column matrix where each column vector represents the embedding of an element in the triple. This 3-column matrix is then fed to a convolution layer where multiple filters are operated to generate different feature maps. These feature maps are used to construct capsules in the first capsule layer. Capsule layers are connected via dynamic routing mechanism. The last capsule layer consists of only one capsule to produce a vector output. The length of this vector output is used to measure the plausibility of the triple. Our proposed CapsE obtains state-of-the-art link prediction results for knowledge graph completion on two benchmark datasets: WN18RR and FB15k-237, and outperforms strong search personalization baselines on SEARCH17 dataset. },
        FILE = { :nguyen_etal_naaclhtl19_acapsule - A Capsule Network Based Embedding Model for Knowledge Graph Completion and Search Personalization.pdf:PDF },
        URL = { https://arxiv.org/abs/1808.04122 },
    }
C
  • Probabilistic Multilevel Clustering via Composite Transportation Distance
    Viet Huynh, Nhat Ho, Dinh Phung and Michael I. Jordan. In In Proc. of Int. Conf. on Artificial Intelligence and Statistics (AISTATS), Okinawa, Japan, apr 2019. [ | | pdf]
    We propose a novel probabilistic approach to multilevel clustering problems based on composite transportation distance, which is a variant of transportation distance where the underlying metric is Kullback-Leibler divergence. Our method involves solving a joint optimization problem over spaces of probability measures to simultaneously discover grouping structures within groups and among groups. By exploiting the connection of our method to the problem of finding composite transportation barycenters, we develop fast and efficient optimization algorithms even for potentially large-scale multilevel datasets. Finally, we present experimental results with both synthetic and real data to demonstrate the efficiency and scalability of the proposed approach.
    @INPROCEEDINGS { ho_etal_aistats19_probabilistic,
        AUTHOR = { Viet Huynh and Nhat Ho and Dinh Phung and Michael I. Jordan },
        TITLE = { Probabilistic Multilevel Clustering via Composite Transportation Distance },
        BOOKTITLE = { In Proc. of Int. Conf. on Artificial Intelligence and Statistics (AISTATS) },
        YEAR = { 2019 },
        ADDRESS = { Okinawa, Japan },
        MONTH = { apr },
        ABSTRACT = { We propose a novel probabilistic approach to multilevel clustering problems based on composite transportation distance, which is a variant of transportation distance where the underlying metric is Kullback-Leibler divergence. Our method involves solving a joint optimization problem over spaces of probability measures to simultaneously discover grouping structures within groups and among groups. By exploiting the connection of our method to the problem of finding composite transportation barycenters, we develop fast and efficient optimization algorithms even for potentially large-scale multilevel datasets. Finally, we present experimental results with both synthetic and real data to demonstrate the efficiency and scalability of the proposed approach. },
        FILE = { :ho_etal_aistats19_probabilistic - Probabilistic Multilevel Clustering Via Composite Transportation Distance.pdf:PDF },
        JOURNAL = { In Proc. of Int. Conf. on Artificial Intelligence and Statistics (AISTATS) },
        URL = { https://arxiv.org/abs/1810.11911 },
    }
C
  • Maximal Divergence Sequential Autoencoder for Binary Software Vulnerability Detection
    Tue Le, Tuan Nguyen, Trung Le, Dinh Phung, Paul Montague, Olivier De Vel and Lizhen Qu. In International Conference on Learning Representations (ICLR), 2019. [ | | pdf]
    @INPROCEEDINGS { le_etal_iclr18_maximal,
        AUTHOR = { Tue Le and Tuan Nguyen and Trung Le and Dinh Phung and Paul Montague and Olivier De Vel and Lizhen Qu },
        TITLE = { Maximal Divergence Sequential Autoencoder for Binary Software Vulnerability Detection },
        BOOKTITLE = { International Conference on Learning Representations (ICLR) },
        YEAR = { 2019 },
        FILE = { :le_etal_iclr18_maximal - Maximal Divergence Sequential Autoencoder for Binary Software Vulnerability Detection.pdf:PDF },
        URL = { https://openreview.net/forum?id=ByloIiCqYQ },
    }
C
  • Robust Anomaly Detection in Videos using Multilevel Representations
    Hung Vu, Tu Dinh Nguyen, Trung Le, Wei Luo and Dinh Phung. In In Proceedings of Thirty-third AAAI Conference on Artificial Intelligence (AAAI), Honolulu, USA, 2019. [ | | pdf]
    @INPROCEEDINGS { vu_etal_aaai19_robustanomaly,
        AUTHOR = { Hung Vu and Tu Dinh Nguyen and Trung Le and Wei Luo and Dinh Phung },
        TITLE = { Robust Anomaly Detection in Videos using Multilevel Representations },
        BOOKTITLE = { In Proceedings of Thirty-third AAAI Conference on Artificial Intelligence (AAAI) },
        YEAR = { 2019 },
        ADDRESS = { Honolulu, USA },
        FILE = { :vu_etal_aaai19_robustanomaly - Robust Anomaly Detection in Videos Using Multilevel Representations.pdf:PDF },
        GROUPS = { Anomaly Detection },
        URL = { https://github.com/SeaOtter/vad_gan },
    }
C
  • Usefulness of Wearable Cameras as a Tool to Enhance Chronic Disease Self-Management: Scoping Review
    Ralph Maddison, Susie Cartledge, Michelle Rogerson, Nicole Sylvia Goedhart, Tarveen Ragbir Singh, Christopher Neil, Dinh Phung and Kylie Ball. JMIR mHealth and uHealth, 7(1):e10371, Jan 2019. [ | | pdf]
    Background: Self-management is a critical component of chronic disease management and can include a host of activities, such as adhering to prescribed medications, undertaking daily care activities, managing dietary intake and body weight, and proactively contacting medical practitioners. The rise of technologies (mobile phones, wearable cameras) for health care use offers potential support for people to better manage their disease in collaboration with their treating health professionals. Wearable cameras can be used to provide rich contextual data and insight into everyday activities and aid in recall. This information can then be used to prompt memory recall or guide the development of interventions to support self-management. Application of wearable cameras to better understand and augment self-management by people with chronic disease has yet to be investigated. Objective: The objective of our review was to ascertain the scope of the literature on the use of wearable cameras for self-management by people with chronic disease and to determine the potential of wearable cameras to assist people to better manage their disease. Methods: We conducted a scoping review, which involved a comprehensive electronic literature search of 9 databases in July 2017. The search strategy focused on studies that used wearable cameras to capture one or more modifiable lifestyle risk factors associated with chronic disease or to capture typical self-management behaviors, or studies that involved a chronic disease population. We then categorized and described included studies according to their characteristics (eg, behaviors measured, study design or type, characteristics of the sample). Results: We identified 31 studies: 25 studies involved primary or secondary data analysis, and 6 were review, discussion, or descriptive articles. Wearable cameras were predominantly used to capture dietary intake, physical activity, activities of daily living, and sedentary behavior. Populations studied were predominantly healthy volunteers, school students, and sports people, with only 1 study examining an intervention using wearable cameras for people with an acquired brain injury. Most studies highlighted technical or ethical issues associated with using wearable cameras, many of which were overcome. Conclusions: This scoping review highlighted the potential of wearable cameras to capture health-related behaviors and risk factors of chronic disease, such as diet, exercise, and sedentary behaviors. Data collected from wearable cameras can be used as an adjunct to traditional data collection methods such as self-reported diaries in addition to providing valuable contextual information. While most studies to date have focused on healthy populations, wearable cameras offer promise to better understand self-management of chronic disease and its context.
    @ARTICLE { maddison_etal_jmir19_usefulness,
        AUTHOR = { Ralph Maddison and Susie Cartledge and Michelle Rogerson and Nicole Sylvia Goedhart and Tarveen Ragbir Singh and Christopher Neil and Dinh Phung and Kylie Ball },
        JOURNAL = { JMIR mHealth and uHealth },
        TITLE = { Usefulness of Wearable Cameras as a Tool to Enhance Chronic Disease Self-Management: Scoping Review },
        YEAR = { 2019 },
        ISSN = { 2291-5222 },
        MONTH = { Jan },
        NUMBER = { 1 },
        PAGES = { e10371 },
        VOLUME = { 7 },
        ABSTRACT = { Background: Self-management is a critical component of chronic disease management and can include a host of activities, such as adhering to prescribed medications, undertaking daily care activities, managing dietary intake and body weight, and proactively contacting medical practitioners. The rise of technologies (mobile phones, wearable cameras) for health care use offers potential support for people to better manage their disease in collaboration with their treating health professionals. Wearable cameras can be used to provide rich contextual data and insight into everyday activities and aid in recall. This information can then be used to prompt memory recall or guide the development of interventions to support self-management. Application of wearable cameras to better understand and augment self-management by people with chronic disease has yet to be investigated. Objective: The objective of our review was to ascertain the scope of the literature on the use of wearable cameras for self-management by people with chronic disease and to determine the potential of wearable cameras to assist people to better manage their disease. Methods: We conducted a scoping review, which involved a comprehensive electronic literature search of 9 databases in July 2017. The search strategy focused on studies that used wearable cameras to capture one or more modifiable lifestyle risk factors associated with chronic disease or to capture typical self-management behaviors, or studies that involved a chronic disease population. We then categorized and described included studies according to their characteristics (eg, behaviors measured, study design or type, characteristics of the sample). Results: We identified 31 studies: 25 studies involved primary or secondary data analysis, and 6 were review, discussion, or descriptive articles. Wearable cameras were predominantly used to capture dietary intake, physical activity, activities of daily living, and sedentary behavior. Populations studied were predominantly healthy volunteers, school students, and sports people, with only 1 study examining an intervention using wearable cameras for people with an acquired brain injury. Most studies highlighted technical or ethical issues associated with using wearable cameras, many of which were overcome. Conclusions: This scoping review highlighted the potential of wearable cameras to capture health-related behaviors and risk factors of chronic disease, such as diet, exercise, and sedentary behaviors. Data collected from wearable cameras can be used as an adjunct to traditional data collection methods such as self-reported diaries in addition to providing valuable contextual information. While most studies to date have focused on healthy populations, wearable cameras offer promise to better understand self-management of chronic disease and its context. },
        DAY = { 03 },
        DOI = { 10.2196/10371 },
        FILE = { :ralph_etal_jmir19_usefulness - Usefulness of Wearable Cameras As a Tool to Enhance Chronic Disease Self Management_ Scoping Review.pdf:PDF },
        KEYWORDS = { eHealth; review; cameras; life-logging; lifestyle behavior; chronic disease },
        URL = { https://mhealth.jmir.org/2019/1/e10371/ },
    }
J
  • On Deep Domain Adaptation: Some Theoretical Understandings
    Trung Le, Khanh Nguyen, Nhat Ho, Hung Bui and Dinh Phung, jun 2019. [ | | pdf]
    Compared with shallow domain adaptation, recent progress in deep domain adaptation has shown that it can achieve higher predictive performance and stronger capacity to tackle structural data (e.g., image and sequential data). The underlying idea of deep domain adaptation is to bridge the gap between source and target domains in a joint space so that a supervised classifier trained on labeled source data can be nicely transferred to the target domain. This idea is certainly intuitive and powerful, however, limited theoretical understandings have been developed to support its underpinning principle. In this paper, we have provided a rigorous framework to explain why it is possible to close the gap of the target and source domains in the joint space. More specifically, we first study the loss incurred when performing transfer learning from the source to the target domain. This provides a theory that explains and generalizes existing work in deep domain adaptation which was mainly empirical. This enables us to further explain why closing the gap in the joint space can directly minimize the loss incurred for transfer learning between the two domains. To our knowledge, this offers the first theoretical result that characterizes a direct bound on the joint space and the gain of transfer learning via deep domain adaptation.
    @MISC { le_etal_arxiv19_ondeepdomain,
        AUTHOR = { Trung Le and Khanh Nguyen and Nhat Ho and Hung Bui and Dinh Phung },
        TITLE = { On Deep Domain Adaptation: Some Theoretical Understandings },
        MONTH = { jun },
        YEAR = { 2019 },
        ABSTRACT = { Compared with shallow domain adaptation, recent progress in deep domain adaptation has shown that it can achieve higher predictive performance and stronger capacity to tackle structural data (e.g., image and sequential data). The underlying idea of deep domain adaptation is to bridge the gap between source and target domains in a joint space so that a supervised classifier trained on labeled source data can be nicely transferred to the target domain. This idea is certainly intuitive and powerful, however, limited theoretical understandings have been developed to support its underpinning principle. In this paper, we have provided a rigorous framework to explain why it is possible to close the gap of the target and source domains in the joint space. More specifically, we first study the loss incurred when performing transfer learning from the source to the target domain. This provides a theory that explains and generalizes existing work in deep domain adaptation which was mainly empirical. This enables us to further explain why closing the gap in the joint space can directly minimize the loss incurred for transfer learning between the two domains. To our knowledge, this offers the first theoretical result that characterizes a direct bound on the joint space and the gain of transfer learning via deep domain adaptation. },
        ARCHIVEPREFIX = { arXiv },
        JOURNAL = { arXiv },
        URL = { http://arxiv.org/abs/1811.06199 },
    }
  • On Scalable Variant of Wasserstein Barycenter
    Tam Le, Viet Huynh, Nhat Ho, Dinh Phung and Makoto Yamada, 2019. [ | ]
    We study a variant of Wasserstein barycenter problem, which we refer to as \emph{tree-sliced Wasserstein barycenter}, by leveraging the structure of tree metrics for the ground metrics in the formulation of Wasserstein distance. Drawing on the tree structure, we propose efficient algorithms for solving the unconstrained and constrained versions of tree-sliced Wasserstein barycenter. The algorithms have fast computational time and efficient memory usage, especially for high dimensional settings while demonstrating favorable results when the tree metrics are appropriately constructed. Experimental results on large-scale synthetic and real datasets from Wasserstein barycenter for documents with word embedding, multilevel clustering, and scalable Bayes problems show the advantages of tree-sliced Wasserstein barycenter over (Sinkhorn) Wasserstein barycenter.
    @MISC { le_etal_arxiv19_scalable,
        AUTHOR = { Tam Le and Viet Huynh and Nhat Ho and Dinh Phung and Makoto Yamada },
        TITLE = { On Scalable Variant of Wasserstein Barycenter },
        YEAR = { 2019 },
        ABSTRACT = { We study a variant of Wasserstein barycenter problem, which we refer to as \emph{tree-sliced Wasserstein barycenter}, by leveraging the structure of tree metrics for the ground metrics in the formulation of Wasserstein distance. Drawing on the tree structure, we propose efficient algorithms for solving the unconstrained and constrained versions of tree-sliced Wasserstein barycenter. The algorithms have fast computational time and efficient memory usage, especially for high dimensional settings while demonstrating favorable results when the tree metrics are appropriately constructed. Experimental results on large-scale synthetic and real datasets from Wasserstein barycenter for documents with word embedding, multilevel clustering, and scalable Bayes problems show the advantages of tree-sliced Wasserstein barycenter over (Sinkhorn) Wasserstein barycenter. },
        ARCHIVEPREFIX = { arXiv },
        EPRINT = { 1910.04483 },
        PRIMARYCLASS = { stat.ML },
    }
  • Perturbations are not Enough: Generating Adversarial Examples with Spatial Distortions
    He Zhao, Trung Le, Paul Montague, Olivier De Vel, Tamas Abraham and Dinh Phung, 2019. [ | ]
    @MISC { zhao_etal_arxiv19_perturbations,
        AUTHOR = { He Zhao and Trung Le and Paul Montague and Olivier De Vel and Tamas Abraham and Dinh Phung },
        TITLE = { Perturbations are not Enough: Generating Adversarial Examples with Spatial Distortions },
        YEAR = { 2019 },
        ARCHIVEPREFIX = { arXiv },
        EPRINT = { 1910.01329 },
        PRIMARYCLASS = { cs.LG },
    }
  • Unsupervised Universal Self-Attention Network for Graph Classification
    Dai Quoc Nguyen, Tu Dinh Nguyen and Dinh Phung, 2019. [ | ]
    @MISC { nguyen_etal_arxiv19_unsupervised,
        AUTHOR = { Dai Quoc Nguyen and Tu Dinh Nguyen and Dinh Phung },
        TITLE = { Unsupervised Universal Self-Attention Network for Graph Classification },
        YEAR = { 2019 },
        ARCHIVEPREFIX = { arXiv },
        EPRINT = { 1909.11855 },
        PRIMARYCLASS = { cs.LG },
    }
  • On Efficient Multilevel Clustering via Wasserstein Distances
    Viet Huynh, Nhat Ho, Nhan Dam, XuanLong Nguyen, Mikhail Yurochkin, Hung Bui and and Dinh Phung, 2019. [ | ]
    @MISC { huynh_etal_arxiv19_efficient,
        AUTHOR = { Viet Huynh and Nhat Ho and Nhan Dam and XuanLong Nguyen and Mikhail Yurochkin and Hung Bui and and Dinh Phung },
        TITLE = { On Efficient Multilevel Clustering via Wasserstein Distances },
        YEAR = { 2019 },
        ARCHIVEPREFIX = { arXiv },
        EPRINT = { 1909.08787 },
        PRIMARYCLASS = { stat.ML },
    }
2018
  • Model-Based Learning for Point Pattern Data
    Ba-Ngu Vo, Nhan Dam, Dinh Phung, Quang N. Tran and Ba-Tuong Vo. Pattern Recognition (PR), 84:136-151, 2018. [ | | pdf]
    This article proposes a framework for model-based point pattern learning using point process theory. Likelihood functions for point pattern data derived from point process theory enable principled yet conceptually transparent extensions of learning tasks, such as classification, novelty detection and clustering, to point pattern data. Furthermore, tractable point pattern models as well as solutions for learning and decision making from point pattern data are developed.
    @ARTICLE { vo_etal_pr18_modelbased,
        AUTHOR = { Ba-Ngu Vo and Nhan Dam and Dinh Phung and Quang N. Tran and Ba-Tuong Vo },
        JOURNAL = { Pattern Recognition (PR) },
        TITLE = { Model-Based Learning for Point Pattern Data },
        YEAR = { 2018 },
        ISSN = { 0031-3203 },
        PAGES = { 136--151 },
        VOLUME = { 84 },
        ABSTRACT = { This article proposes a framework for model-based point pattern learning using point process theory. Likelihood functions for point pattern data derived from point process theory enable principled yet conceptually transparent extensions of learning tasks, such as classification, novelty detection and clustering, to point pattern data. Furthermore, tractable point pattern models as well as solutions for learning and decision making from point pattern data are developed. },
        DOI = { https://doi.org/10.1016/j.patcog.2018.07.008 },
        FILE = { :vo_etal_pr18_modelbased - Model Based Learning for Point Pattern Data.pdf:PDF },
        KEYWORDS = { Point pattern, Point process, Random finite set, Multiple instance learning, Classification, Novelty detection, Clustering },
        PUBLISHER = { Elsevier },
        URL = { http://www.sciencedirect.com/science/article/pii/S0031320318302395 },
    }
J
  • Robust Bayesian Kernel Machine via Stein Variational Gradient Descent for Big Data
    Khanh Nguyen, Trung Le, Tu Nguyen, Geoff Webb and Dinh Phung. In Proc. of the 24th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD), London, UK, aug 2018. [ | ]
    Kernel methods are powerful supervised machine learning models for their strong generalization ability, especially on limited data to effectively generalize on unseen data. However, most kernel methods, including the state-of-the-art LIBSVM, are vulnerable to the curse of kernelization, making them infeasible to apply to large-scale datasets. This issue is exacerbated when kernel methods are used in conjunction with a grid search to tune their kernel parameters and hyperparameters which brings in the question of model robustness when applied to real datasets. In this paper, we propose a robust Bayesian Kernel Machine (BKM) – a Bayesian kernel machine that exploits the strengths of both the Bayesian modelling and kernel methods. A key challenge for such a formulation is the need for an efcient learning algorithm. To this end, we successfully extended the recent Stein variational theory for Bayesian inference for our proposed model, resulting in fast and efcient learning and prediction algorithms. Importantly our proposed BKM is resilient to the curse of kernelization, hence making it applicable to large-scale datasets and robust to parameter tuning, avoiding the associated expense and potential pitfalls with current practice of parameter tuning. Our extensive experimental results on 12 benchmark datasets show that our BKM without tuning any parameter can achieve comparable predictive performance with the state-of-the-art LIBSVM and signifcantly outperforms other baselines, while obtaining signifcantly speedup in terms of the total training time compared with its rivals.
    @INPROCEEDINGS { nguyen_etal_kdd18_robustbayesian,
        AUTHOR = { Khanh Nguyen and Trung Le and Tu Nguyen and Geoff Webb and Dinh Phung },
        TITLE = { Robust Bayesian Kernel Machine via Stein Variational Gradient Descent for Big Data },
        BOOKTITLE = { Proc. of the 24th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD) },
        YEAR = { 2018 },
        ADDRESS = { London, UK },
        MONTH = { aug },
        PUBLISHER = { ACM },
        ABSTRACT = { Kernel methods are powerful supervised machine learning models for their strong generalization ability, especially on limited data to effectively generalize on unseen data. However, most kernel methods, including the state-of-the-art LIBSVM, are vulnerable to the curse of kernelization, making them infeasible to apply to large-scale datasets. This issue is exacerbated when kernel methods are used in conjunction with a grid search to tune their kernel parameters and hyperparameters which brings in the question of model robustness when applied to real datasets. In this paper, we propose a robust Bayesian Kernel Machine (BKM) – a Bayesian kernel machine that exploits the strengths of both the Bayesian modelling and kernel methods. A key challenge for such a formulation is the need for an efcient learning algorithm. To this end, we successfully extended the recent Stein variational theory for Bayesian inference for our proposed model, resulting in fast and efcient learning and prediction algorithms. Importantly our proposed BKM is resilient to the curse of kernelization, hence making it applicable to large-scale datasets and robust to parameter tuning, avoiding the associated expense and potential pitfalls with current practice of parameter tuning. Our extensive experimental results on 12 benchmark datasets show that our BKM without tuning any parameter can achieve comparable predictive performance with the state-of-the-art LIBSVM and signifcantly outperforms other baselines, while obtaining signifcantly speedup in terms of the total training time compared with its rivals. },
        FILE = { :nguyen_etal_kdd18_robustbayesian - Robust Bayesian Kernel Machine Via Stein Variational Gradient Descent for Big Data.pdf:PDF },
    }
C
  • MGAN: Training Generative Adversarial Nets with Multiple Generators
    Quan Hoang, Tu Dinh Nguyen, Trung Le and Dinh Phung. In International Conference on Learning Representations (ICLR), 2018. [ | | pdf]
    We propose in this paper a new approach to train the Generative Adversarial Nets (GANs) with a mixture of generators to overcome the mode collapsing problem. The main intuition is to employ multiple generators, instead of using a single one as in the original GAN. The idea is simple, yet proven to be extremely effective at covering diverse data modes, easily overcoming the mode collapsing problem and delivering state-of-the-art results. A minimax formulation was able to establish among a classifier, a discriminator, and a set of generators in a similar spirit with GAN. Generators create samples that are intended to come from the same distribution as the training data, whilst the discriminator determines whether samples are true data or generated by generators, and the classifier specifies which generator a sample comes from. The distinguishing feature is that internal samples are created from multiple generators, and then one of them will be randomly selected as final output similar to the mechanism of a probabilistic mixture model. We term our method Mixture Generative Adversarial Nets (MGAN). We develop theoretical analysis to prove that, at the equilibrium, the Jensen-Shannon divergence (JSD) between the mixture of generators’ distributions and the empirical data distribution is minimal, whilst the JSD among generators’ distributions is maximal, hence effectively avoiding the mode collapsing problem. By utilizing parameter sharing, our proposed model adds minimal computational cost to the standard GAN, and thus can also efficiently scale to large-scale datasets. We conduct extensive experiments on synthetic 2D data and natural image databases (CIFAR-10, STL-10 and ImageNet) to demonstrate the superior performance of our MGAN in achieving state-of-the-art Inception scores over latest baselines, generating diverse and appealing recognizable objects at different resolutions, and specializing in capturing different types of objects by the generators.
    @INPROCEEDINGS { hoang_etal_iclr18_mgan,
        AUTHOR = { Quan Hoang and Tu Dinh Nguyen and Trung Le and Dinh Phung },
        TITLE = { {MGAN}: Training Generative Adversarial Nets with Multiple Generators },
        BOOKTITLE = { International Conference on Learning Representations (ICLR) },
        YEAR = { 2018 },
        ABSTRACT = { We propose in this paper a new approach to train the Generative Adversarial Nets (GANs) with a mixture of generators to overcome the mode collapsing problem. The main intuition is to employ multiple generators, instead of using a single one as in the original GAN. The idea is simple, yet proven to be extremely effective at covering diverse data modes, easily overcoming the mode collapsing problem and delivering state-of-the-art results. A minimax formulation was able to establish among a classifier, a discriminator, and a set of generators in a similar spirit with GAN. Generators create samples that are intended to come from the same distribution as the training data, whilst the discriminator determines whether samples are true data or generated by generators, and the classifier specifies which generator a sample comes from. The distinguishing feature is that internal samples are created from multiple generators, and then one of them will be randomly selected as final output similar to the mechanism of a probabilistic mixture model. We term our method Mixture Generative Adversarial Nets (MGAN). We develop theoretical analysis to prove that, at the equilibrium, the Jensen-Shannon divergence (JSD) between the mixture of generators’ distributions and the empirical data distribution is minimal, whilst the JSD among generators’ distributions is maximal, hence effectively avoiding the mode collapsing problem. By utilizing parameter sharing, our proposed model adds minimal computational cost to the standard GAN, and thus can also efficiently scale to large-scale datasets. We conduct extensive experiments on synthetic 2D data and natural image databases (CIFAR-10, STL-10 and ImageNet) to demonstrate the superior performance of our MGAN in achieving state-of-the-art Inception scores over latest baselines, generating diverse and appealing recognizable objects at different resolutions, and specializing in capturing different types of objects by the generators. },
        FILE = { :hoang_etal_iclr18_mgan - MGAN_ Training Generative Adversarial Nets with Multiple Generators.pdf:PDF },
        URL = { https://openreview.net/forum?id=rkmu5b0a- },
    }
C
  • Geometric enclosing networks
    Trung Le, Hung Vu, Tu Dinh Nguyen and Dinh Phung. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, {IJCAI-18}, pages 2355-2361, July 2018. [ | ]
    Training model to generate data has increasingly attracted research attention and become important in modern world applications. We propose in this paper a new geometry-based optimization approach to address this problem. Orthogonal to current stateof-the-art density-based approaches, most notably VAE and GAN, we present a fresh new idea that borrows the principle of minimal enclosing ball to train a generator G (z) in such a way that both training and generated data, after being mapped to the feature space, are enclosed in the same sphere. We develop theory to guarantee that the mapping is bijective so that its inverse from feature space to data space results in expressive nonlinear contours to describe the data manifold, hence ensuring data generated are also lying on the data manifold learned from training data. Our model enjoys a nice geometric interpretation, hence termed Geometric Enclosing Networks (GEN), and possesses some key advantages over its rivals, namely simple and easy-to-control optimization formulation, avoidance of mode collapsing and efficiently learn data manifold representation in a completely unsupervised manner. We conducted extensive experiments on synthesis and real-world datasets to illustrate the behaviors, strength and weakness of our proposed GEN, in particular its ability to handle multi-modal data and quality of generated data.
    @INPROCEEDINGS { le_etal_ijcai18_geometric,
        AUTHOR = { Trung Le and Hung Vu and Tu Dinh Nguyen and Dinh Phung },
        TITLE = { Geometric enclosing networks },
        BOOKTITLE = { Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, {IJCAI-18} },
        PUBLISHER = { International Joint Conferences on Artificial Intelligence Organization },
        PAGES = { 2355--2361 },
        YEAR = { 2018 },
        MONTH = { July },
        ABSTRACT = { Training model to generate data has increasingly attracted research attention and become important in modern world applications. We propose in this paper a new geometry-based optimization approach to address this problem. Orthogonal to current stateof-the-art density-based approaches, most notably VAE and GAN, we present a fresh new idea that borrows the principle of minimal enclosing ball to train a generator G (z) in such a way that both training and generated data, after being mapped to the feature space, are enclosed in the same sphere. We develop theory to guarantee that the mapping is bijective so that its inverse from feature space to data space results in expressive nonlinear contours to describe the data manifold, hence ensuring data generated are also lying on the data manifold learned from training data. Our model enjoys a nice geometric interpretation, hence termed Geometric Enclosing Networks (GEN), and possesses some key advantages over its rivals, namely simple and easy-to-control optimization formulation, avoidance of mode collapsing and efficiently learn data manifold representation in a completely unsupervised manner. We conducted extensive experiments on synthesis and real-world datasets to illustrate the behaviors, strength and weakness of our proposed GEN, in particular its ability to handle multi-modal data and quality of generated data. },
        FILE = { :le_etal_ijcai18_geometric - Geometric Enclosing Networks.pdf:PDF },
    }
C
  • A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network
    Dai Quoc Nguyen, Tu Dinh Nguyen, Dat Quoc Nguyen and Dinh Phung. In Proc. of. the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL), 2018. [ | | pdf]
    We introduce a novel embedding method for knowledge base completion task. Our approach advances state-of-the-art (SOTA) by employing a convolutional neural network (CNN) for the task which can capture global relationships and transitional characteristics. We represent each triple (head entity, relation, tail entity) as a 3-column matrix which is the input for the convolution layer. Different filters having a same shape of 1x3 are operated over the input matrix to produce different feature maps which are then concatenated into a single feature vector. This vector is used to return a score for the triple via a dot product. The returned score is used to predict whether the triple is valid or not. Experiments show that ConvKB achieves better link prediction results than previous SOTA models on two current benchmark datasets WN18RR and FB15k-237.
    @INPROCEEDINGS { nguyen_etal_naacl18_anovelembedding,
        AUTHOR = { Dai Quoc Nguyen and Tu Dinh Nguyen and Dat Quoc Nguyen and Dinh Phung },
        TITLE = { A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network },
        BOOKTITLE = { Proc. of. the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL) },
        YEAR = { 2018 },
        ABSTRACT = { We introduce a novel embedding method for knowledge base completion task. Our approach advances state-of-the-art (SOTA) by employing a convolutional neural network (CNN) for the task which can capture global relationships and transitional characteristics. We represent each triple (head entity, relation, tail entity) as a 3-column matrix which is the input for the convolution layer. Different filters having a same shape of 1x3 are operated over the input matrix to produce different feature maps which are then concatenated into a single feature vector. This vector is used to return a score for the triple via a dot product. The returned score is used to predict whether the triple is valid or not. Experiments show that ConvKB achieves better link prediction results than previous SOTA models on two current benchmark datasets WN18RR and FB15k-237. },
        FILE = { :nguyen_etal_naacl18_anovelembedding - A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network.pdf:PDF },
        URL = { https://arxiv.org/abs/1712.02121 },
    }
C
  • Text Generation with Deep Variational GAN
    Mahmoud Hossam, Trung Le, Michael Papasimeon, Viet Huynh and Dinh Phung. In 32nd Neural Information Processing System (NIPS) Workshop on Bayesian Deep Learning, 2018. [ | ]
    Generating realistic sequences is a central task in many machine learning appli-cations. There has been considerable recent progress on building deep generativemodels for sequence generation tasks. However, the issue of mode-collapsingremains a main issue for the current models. In this paper we propose a GAN-based generic framework to address the problem of mode-collapse in a principledapproach. We change the standard GAN objective to maximize a variationallower-bound of the log-likelihood while minimizing the Jensen-Shanon diver-gence between data and model distributions. We experiment our model with textgeneration task and show that it can generate realistic text with high diversity.
    @INPROCEEDINGS { hossam_etal_bdl18_textgeneration,
        AUTHOR = { Mahmoud Hossam and Trung Le and Michael Papasimeon and Viet Huynh and Dinh Phung },
        TITLE = { Text Generation with Deep Variational {GAN} },
        BOOKTITLE = { 32nd Neural Information Processing System (NIPS) Workshop on Bayesian Deep Learning },
        YEAR = { 2018 },
        ABSTRACT = { Generating realistic sequences is a central task in many machine learning appli-cations. There has been considerable recent progress on building deep generativemodels for sequence generation tasks. However, the issue of mode-collapsingremains a main issue for the current models. In this paper we propose a GAN-based generic framework to address the problem of mode-collapse in a principledapproach. We change the standard GAN objective to maximize a variationallower-bound of the log-likelihood while minimizing the Jensen-Shanon diver-gence between data and model distributions. We experiment our model with textgeneration task and show that it can generate realistic text with high diversity. },
        FILE = { :hossam_etal_bdl18_textgeneration - Text Generation with Deep Variational GAN.pdf:PDF },
    }
C
  • Batch-normalized Deep Boltzmann Machines
    Hung Vu, Tu Dinh Nguyen, Trung Le, Wei Luo and Dinh Phung. In In Proceedings of Asian Conference on Machine Learning (ACML), Beijing, China, 2018. [ | ]
    @INPROCEEDINGS { vu_etal_acml18_batchnormalized,
        AUTHOR = { Hung Vu and Tu Dinh Nguyen and Trung Le and Wei Luo and Dinh Phung },
        TITLE = { Batch-normalized Deep {Boltzmann} Machines },
        BOOKTITLE = { In Proceedings of Asian Conference on Machine Learning (ACML) },
        YEAR = { 2018 },
        ADDRESS = { Beijing, China },
        OWNER = { hungv },
        TIMESTAMP = { 2018.03.22 },
    }
C
  • Clustering Induced Kernel Learning
    Nguyen, Khanh, Dam, Nhan, Le, Trung, Nguyen, {Tu Dinh} and Phung, Dinh. In Proc. of the 10th Asian Conference on Machine Learning (ACML), pages 129-144, 14--16 Nov 2018. [ | | pdf]
    Learning rich and expressive kernel functions is a challenging task in kernel-based supervised learning. Multiple kernel learning (MKL) approach addresses this problem by combining a mixed variety of kernels and letting the optimization solver choose the most appropriate combination. However, most of existing methods are parametric in the sense that they require a predefined list of kernels. Hence, there appears a substantial trade-off between computation and the modeling risk of not being able to explore more expressive and suitable kernel functions. Moreover, current existing approaches to combine kernels cannot exploit clustering structure carried in data, especially when data are heterogeneous. In this work, we present a new framework that leverages Bayesian nonparametric models (i.e, automatically grow kernel functions) with multiple kernel learning to develop a new framework that enjoys the nonparametric flavor in the context of multiple kernel learning. In particular, we propose \emph{Clustering Induced Kernel Learning} (CIK) method that can automatically discover clustering structure from the data and train a single kernel machine to fit data in each discovered cluster simultaneously. The outcome of our proposed method includes both clustering analysis and multiple kernel classifier for a given dataset. We conduct extensive experiments on several benchmark datasets. The experimental results show that our method can improve classification and clustering performance when datasets have complex clustering structure with different preferred kernels.
    @INPROCEEDINGS { nguyen_etal_acml18_clustering,
        AUTHOR = { Nguyen, Khanh and Dam, Nhan and Le, Trung and Nguyen, {Tu Dinh} and Phung, Dinh },
        TITLE = { Clustering Induced Kernel Learning },
        BOOKTITLE = { Proc. of the 10th Asian Conference on Machine Learning (ACML) },
        YEAR = { 2018 },
        EDITOR = { Zhu, Jun and Takeuchi, Ichiro },
        VOLUME = { 95 },
        SERIES = { Proceedings of Machine Learning Research },
        PAGES = { 129--144 },
        MONTH = { 14--16 Nov },
        PUBLISHER = { PMLR },
        ABSTRACT = { Learning rich and expressive kernel functions is a challenging task in kernel-based supervised learning. Multiple kernel learning (MKL) approach addresses this problem by combining a mixed variety of kernels and letting the optimization solver choose the most appropriate combination. However, most of existing methods are parametric in the sense that they require a predefined list of kernels. Hence, there appears a substantial trade-off between computation and the modeling risk of not being able to explore more expressive and suitable kernel functions. Moreover, current existing approaches to combine kernels cannot exploit clustering structure carried in data, especially when data are heterogeneous. In this work, we present a new framework that leverages Bayesian nonparametric models (i.e, automatically grow kernel functions) with multiple kernel learning to develop a new framework that enjoys the nonparametric flavor in the context of multiple kernel learning. In particular, we propose \emph{Clustering Induced Kernel Learning} (CIK) method that can automatically discover clustering structure from the data and train a single kernel machine to fit data in each discovered cluster simultaneously. The outcome of our proposed method includes both clustering analysis and multiple kernel classifier for a given dataset. We conduct extensive experiments on several benchmark datasets. The experimental results show that our method can improve classification and clustering performance when datasets have complex clustering structure with different preferred kernels. },
        FILE = { :nguyen_etal_acml18_clustering - Clustering Induced Kernel Learning.pdf:PDF;nguyen18a.pdf:http\://proceedings.mlr.press/v95/nguyen18a/nguyen18a.pdf:PDF },
        URL = { http://proceedings.mlr.press/v95/nguyen18a.html },
    }
C
  • LTARM: A novel temporal association rule mining method to understand toxicities in a routine cancer treatment
    Dang Nguyen, Wei Luo, Dinh Phung and Svetha Venkatesh. Knowledge-Based Systems, 2018. [ | ]
    Cancer is a worldwide problem and one of the leading causes of death. Increasing prevalence of cancer, particularly in developing countries, demands better understandings of the effectiveness and adverse consequences of different cancer treatment regimes in real patient populations. Current understandings of cancer treatment toxicities are often derived from either “clean” patient cohorts or coarse population statistics. Thus, it is difficult to get up-to-date and local assessments of treatment toxicities for specific cancer centers. To address these problems, we propose a novel and efficient method for discovering toxicity progression patterns in the form of temporal association rules (TARs). A temporal association rule is defined as a rule where the diagnosis codes in the right hand side (e.g., a combination of toxicities/complications) are temporally occurred after the diagnosis codes in the left hand side (e.g., a particular type of cancer treatment). Our method develops a lattice structure to efficiently discover TARs. More specifically, the lattice structure is first constructed to store all frequent diagnosis codes in the dataset. It is then traversed using the paternity relations among nodes to generate TARs. Our extensive experiments show the effectiveness of the proposed method in discovering major toxicity patterns in comparison with the temporal comorbidity analysis. In addition, our method significantly outperforms existing methods for mining TARs in terms of runtime.
    @ARTICLE { nguyen_kbs18_ltarm,
        AUTHOR = { Dang Nguyen and Wei Luo and Dinh Phung and Svetha Venkatesh },
        TITLE = { {LTARM}: A novel temporal association rule mining method to understand toxicities in a routine cancer treatment },
        JOURNAL = { Knowledge-Based Systems },
        YEAR = { 2018 },
        ABSTRACT = { Cancer is a worldwide problem and one of the leading causes of death. Increasing prevalence of cancer, particularly in developing countries, demands better understandings of the effectiveness and adverse consequences of different cancer treatment regimes in real patient populations. Current understandings of cancer treatment toxicities are often derived from either “clean” patient cohorts or coarse population statistics. Thus, it is difficult to get up-to-date and local assessments of treatment toxicities for specific cancer centers. To address these problems, we propose a novel and efficient method for discovering toxicity progression patterns in the form of temporal association rules (TARs). A temporal association rule is defined as a rule where the diagnosis codes in the right hand side (e.g., a combination of toxicities/complications) are temporally occurred after the diagnosis codes in the left hand side (e.g., a particular type of cancer treatment). Our method develops a lattice structure to efficiently discover TARs. More specifically, the lattice structure is first constructed to store all frequent diagnosis codes in the dataset. It is then traversed using the paternity relations among nodes to generate TARs. Our extensive experiments show the effectiveness of the proposed method in discovering major toxicity patterns in comparison with the temporal comorbidity analysis. In addition, our method significantly outperforms existing methods for mining TARs in terms of runtime. },
        DOI = { https://doi.org/10.1016/j.knosys.2018.07.031 },
        FILE = { :nguyen_kbs18_ltarm - LTARM_ a Novel Temporal Association Rule Mining Method to Understand Toxicities in a Routine Cancer Treatment.pdf:PDF },
    }
J
  • Jointly predicting affective and mental health scores using deep neural networks of visual cues on the Web
    Hung Nguyen, Van Nguyen, Thin Nguyen, Mark Larsen, Bridianne O'Dea, Duc Thanh Nguyen, Trung Le, Dinh Phung, Svetha Venkatesh and Helen Christensen. In Proc. of the Int. Conf. on Web Information Systems Engineering (WISE)Springer, , 2018. [ | ]
    Despite the range of studies examining the relationship between mental health and social media data, not all prior studies have validated the social media markers against “ground truth”, or validated psychiatric information, in general community samples. Instead, researchers have approximated psychiatric diagnosis using user statements such as “I have been diagnosed as X”. Without “ground truth”, the value of predictive algorithms is highly questionable and potentially harmful. In addition, for social media data, whilst linguistic features have been widely identified as strong markers of mental health disorders, little is known about non-textual features on their links with the disorders. The current work is a longitudinal study during which participants’ mental health data, consisting of depression and anxiety scores, were collected fortnightly with a validated, diagnostic, clinical measure. Also, datasets with labels relevant to mental health scores, such as emotional scores, are also employed to improve the performance in prediction of mental health scores. This work introduces a deep neural network-based method integrating sub-networks on predicting affective scores and mental health outcomes from images. Experimental results have shown that in the both predictions of emotion and mental health scores, (1) deep features majorly outperform handcrafted ones and (2) the proposed network achieves better performance compared with separate networks.
    @INCOLLECTION { nguyen_etal_wise18_jointly,
        AUTHOR = { Hung Nguyen and Van Nguyen and Thin Nguyen and Mark Larsen and Bridianne O'Dea and Duc Thanh Nguyen and Trung Le and Dinh Phung and Svetha Venkatesh and Helen Christensen },
        TITLE = { Jointly predicting affective and mental health scores using deep neural networks of visual cues on the Web },
        BOOKTITLE = { Proc. of the Int. Conf. on Web Information Systems Engineering (WISE) },
        PUBLISHER = { Springer },
        YEAR = { 2018 },
        SERIES = { Lecture Notes in Computer Science },
        ABSTRACT = { Despite the range of studies examining the relationship between mental health and social media data, not all prior studies have validated the social media markers against “ground truth”, or validated psychiatric information, in general community samples. Instead, researchers have approximated psychiatric diagnosis using user statements such as “I have been diagnosed as X”. Without “ground truth”, the value of predictive algorithms is highly questionable and potentially harmful. In addition, for social media data, whilst linguistic features have been widely identified as strong markers of mental health disorders, little is known about non-textual features on their links with the disorders. The current work is a longitudinal study during which participants’ mental health data, consisting of depression and anxiety scores, were collected fortnightly with a validated, diagnostic, clinical measure. Also, datasets with labels relevant to mental health scores, such as emotional scores, are also employed to improve the performance in prediction of mental health scores. This work introduces a deep neural network-based method integrating sub-networks on predicting affective scores and mental health outcomes from images. Experimental results have shown that in the both predictions of emotion and mental health scores, (1) deep features majorly outperform handcrafted ones and (2) the proposed network achieves better performance compared with separate networks. },
        FILE = { :nguyen_etal_wise18_jointly - Jointly Predicting Affective and Mental Health Scores Using Deep Neural Networks of Visual Cues on the Web.pdf:PDF },
        LANGUAGE = { English },
        OWNER = { thinng },
        TIMESTAMP = { 2017.08.28 },
    }
BC
  • Learning Graph Representation via Frequent Subgraphs
    Dang Nguyen, Wei Luo, Tu Dinh Nguyen, Svetha Venkatesh and Dinh Phung. In Proc. of SIAM Int. Conf. on Data Mining (SDM), 2018. (Student travel award). [ | ]
    @INPROCEEDINGS { nguyen_etal_sdm18_learning,
        AUTHOR = { Dang Nguyen and Wei Luo and Tu Dinh Nguyen and Svetha Venkatesh and Dinh Phung },
        TITLE = { Learning Graph Representation via Frequent Subgraphs },
        BOOKTITLE = { Proc. of SIAM Int. Conf. on Data Mining (SDM) },
        YEAR = { 2018 },
        PUBLISHER = { SIAM },
        NOTE = { Student travel award },
        FILE = { :nguyen_etal_sdm18_learning - Learning Graph Representation Via Frequent Subgraphs.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2018.01.12 },
    }
C
  • Sqn2Vec: Learning Sequence Representation via Sequential Patterns with a Gap Constraint
    Dang Nguyen, Wei Luo, Tu Dinh Nguyen, Svetha Venkatesh and Dinh Phung. In ECML-PKDD, 2018. (Runner-up Best Student Machine Leaning Paper Award). [ | ]
    When learning sequence representations, traditional pattern-based methods often suffer from the data sparsity and high-dimensionality problems while recent neural embedding methods often fail on sequential datasets with a small vocabulary. To address these disadvantages, we propose an unsupervised method (named Sqn2Vec) which first leverages sequential patterns (SPs) to increase the vocabulary size and then learns low-dimensional continuous vectors for sequences via a neural embedding model. Moreover, our method enforces a gap constraint among symbols in sequences to obtain meaningful and discriminative SPs. Consequently, Sqn2Vec produces significantly better sequence representations than a comprehensive list of state-of-the-art baselines, particularly on sequential datasets with a relatively small vocabulary. We demonstrate the superior performance of Sqn2Vec in several machine learning tasks including sequence classification, clustering, and visualization.
    @INPROCEEDINGS { nguyen_etal_ecml18_sqn2vec,
        AUTHOR = { Dang Nguyen and Wei Luo and Tu Dinh Nguyen and Svetha Venkatesh and Dinh Phung },
        TITLE = { {Sqn2Vec}: Learning Sequence Representation via Sequential Patterns with a Gap Constraint },
        BOOKTITLE = { ECML-PKDD },
        YEAR = { 2018 },
        NOTE = { Runner-up Best Student Machine Leaning Paper Award },
        ABSTRACT = { When learning sequence representations, traditional pattern-based methods often suffer from the data sparsity and high-dimensionality problems while recent neural embedding methods often fail on sequential datasets with a small vocabulary. To address these disadvantages, we propose an unsupervised method (named Sqn2Vec) which first leverages sequential patterns (SPs) to increase the vocabulary size and then learns low-dimensional continuous vectors for sequences via a neural embedding model. Moreover, our method enforces a gap constraint among symbols in sequences to obtain meaningful and discriminative SPs. Consequently, Sqn2Vec produces significantly better sequence representations than a comprehensive list of state-of-the-art baselines, particularly on sequential datasets with a relatively small vocabulary. We demonstrate the superior performance of Sqn2Vec in several machine learning tasks including sequence classification, clustering, and visualization. },
        FILE = { :nguyen_etal_ecml18_sqn2vec - Sqn2Vec_ Learning Sequence Representation Via Sequential Patterns with a Gap Constraint.pdf:PDF },
    }
C
  • A Convolutional Neural Network-based Model for Knowledge Base Completion and Its Application to Search Personalization
    Dai Quoc Nguyen, Dat Quoc Nguyen, Tu Dinh Nguyen and Dinh Phung. Semantic Web journal (SWJ), 2018. [ | | pdf]
    In this paper, we propose a novel embedding model, named ConvKB, for knowledge base completion. Our model ConvKB advances state-of-the-art models by employing a convolutional neural network, so that it can capture global relationships and transitional characteristics between entities and relations in knowledge bases. In ConvKB, each triple (head entity, relation, tail entity) is represented as a 3-column matrix where each column vector represents a triple element. This 3-column matrix is then fed to a convolution layer where multiple filters are operated on the matrix to generate different feature maps. These feature maps are then concatenated into a single feature vector representing the input triple. The feature vector is multiplied with a weight vector via a dot product to return a score. This score is then used to predict whether the triple is valid or not. Experiments show that ConvKB obtains better link prediction and triple classification results than previous state-of-the-art models on benchmark datasets WN18RR, FB15k-237, WN11 and FB13. We further apply our ConvKB to search personalization problem which aims to tailor the search results to each specific user based on the user's personal interests and preferences. In particular, we model the potential relationship between the submitted query, the user and the search result (i.e., document) as a triple \textit(query, user, document) on which the ConvKB is able to work. Experimental results on query logs from a commercial web search engine show that ConvKB achieves better performances than the standard ranker as well as up-to-date search personalization baselines.
    @ARTICLE { nguyen_etal_swj18_convolutional,
        AUTHOR = { Dai Quoc Nguyen and Dat Quoc Nguyen and Tu Dinh Nguyen and Dinh Phung },
        TITLE = { A Convolutional Neural Network-based Model for Knowledge Base Completion and Its Application to Search Personalization },
        JOURNAL = { Semantic Web journal (SWJ) },
        YEAR = { 2018 },
        ABSTRACT = { In this paper, we propose a novel embedding model, named ConvKB, for knowledge base completion. Our model ConvKB advances state-of-the-art models by employing a convolutional neural network, so that it can capture global relationships and transitional characteristics between entities and relations in knowledge bases. In ConvKB, each triple (head entity, relation, tail entity) is represented as a 3-column matrix where each column vector represents a triple element. This 3-column matrix is then fed to a convolution layer where multiple filters are operated on the matrix to generate different feature maps. These feature maps are then concatenated into a single feature vector representing the input triple. The feature vector is multiplied with a weight vector via a dot product to return a score. This score is then used to predict whether the triple is valid or not. Experiments show that ConvKB obtains better link prediction and triple classification results than previous state-of-the-art models on benchmark datasets WN18RR, FB15k-237, WN11 and FB13. We further apply our ConvKB to search personalization problem which aims to tailor the search results to each specific user based on the user's personal interests and preferences. In particular, we model the potential relationship between the submitted query, the user and the search result (i.e., document) as a triple \textit(query, user, document) on which the ConvKB is able to work. Experimental results on query logs from a commercial web search engine show that ConvKB achieves better performances than the standard ranker as well as up-to-date search personalization baselines. },
        FILE = { :nguyen_etal_swj18_convolutional - A Convolutional Neural Network Based Model for Knowledge Base Completion and Its Application to Search Personalization.pdf:PDF },
        URL = { http://www.semantic-web-journal.net/system/files/swj1867.pdf },
    }
J
  • GoGP: Scalable Geometric-based Gaussian Process for Online Regression
    Trung Le, Khanh Nguyen, Vu Nguyen, Tu Dinh Nguyen and Dinh Phung. Knowledge and Information Systems (KAIS), may 2018. [ | ]
    One of the most current challenging problems in Gaussian process regression (GPR) is to handle large-scale datasets and to accommodate an online learning setting where data arrive irregularly on the fly. In this paper, we introduce a novel online Gaussian process model that could scale with massive datasets. Our approach is formulated based on alternative representation of the Gaussian process under geometric and optimization views, hence termed geometric-based online GP (GoGP). We developed theory to guarantee that with a good convergence rate our proposed algorithm always produces a (sparse) solution which is close to the true optima to any arbitrary level of approximation accuracy specified a priori. Furthermore, our method is proven to scale seamlessly not only with large-scale datasets, but also to adapt accurately with streaming data. We extensively evaluated our proposed model against state-of-the-art baselines using several large-scale datasets for online regression task. The experimental results show that our GoGP delivered comparable, or slightly better, predictive performance while achieving a magnitude of computational speedup compared withits rivals under online setting. More importantly, its convergence behavior is guaranteed through our theoretical analysis, which is rapid and stable while achieving lower errors.
    @ARTICLE { le_etal_kais18_gogp,
        AUTHOR = { Trung Le and Khanh Nguyen and Vu Nguyen and Tu Dinh Nguyen and Dinh Phung },
        TITLE = { {GoGP}: Scalable Geometric-based Gaussian Process for Online Regression },
        JOURNAL = { Knowledge and Information Systems (KAIS) },
        YEAR = { 2018 },
        MONTH = { may },
        ABSTRACT = { One of the most current challenging problems in Gaussian process regression (GPR) is to handle large-scale datasets and to accommodate an online learning setting where data arrive irregularly on the fly. In this paper, we introduce a novel online Gaussian process model that could scale with massive datasets. Our approach is formulated based on alternative representation of the Gaussian process under geometric and optimization views, hence termed geometric-based online GP (GoGP). We developed theory to guarantee that with a good convergence rate our proposed algorithm always produces a (sparse) solution which is close to the true optima to any arbitrary level of approximation accuracy specified a priori. Furthermore, our method is proven to scale seamlessly not only with large-scale datasets, but also to adapt accurately with streaming data. We extensively evaluated our proposed model against state-of-the-art baselines using several large-scale datasets for online regression task. The experimental results show that our GoGP delivered comparable, or slightly better, predictive performance while achieving a magnitude of computational speedup compared withits rivals under online setting. More importantly, its convergence behavior is guaranteed through our theoretical analysis, which is rapid and stable while achieving lower errors. },
        FILE = { :le_etal_kais18_gogp - GoGP_ Scalable Geometric Based Gaussian Process for Online Regression.pdf:PDF },
    }
J
  • Effective Identification of Similar Patients through Sequential Matching over ICD Code Embedding
    Dang Nguyen, Wei Luo, Svetha Venkatesh and Dinh Phung. Journal of Medical Systems (JMS), 42(5):94, April 2018. [ | | pdf]
    Evidence-based medicine often involves the identification of patients with similar conditions, which are often captured in ICD code sequences. With no satisfying prior solutions for matching ICD-10 code sequences, this paper presents a method which effectively captures the clinical similarity among routine patients who have multiple comorbidities and complex care needs. Our method leverages the recent progress in representation learning of individual ICD-10 codes, and it explicitly uses the sequential order of codes for matching. Empirical evaluation on a state-wide cancer data collection shows that our proposed method achieves significantly higher matching performance compared with state-of-the-art methods ignoring the sequential order. Our method better identifies similar patients in a number of clinical outcomes including readmission and mortality outlook. Although this paper focuses on ICD-10 diagnosis code sequences, our method can be adapted to work with other codified sequence data.
    @ARTICLE { nguyen_etal_jms18_effective,
        AUTHOR = { Dang Nguyen and Wei Luo and Svetha Venkatesh and Dinh Phung },
        TITLE = { Effective Identification of Similar Patients through Sequential Matching over ICD Code Embedding },
        JOURNAL = { Journal of Medical Systems (JMS) },
        YEAR = { 2018 },
        VOLUME = { 42 },
        NUMBER = { 5 },
        PAGES = { 94 },
        MONTH = { April },
        ABSTRACT = { Evidence-based medicine often involves the identification of patients with similar conditions, which are often captured in ICD code sequences. With no satisfying prior solutions for matching ICD-10 code sequences, this paper presents a method which effectively captures the clinical similarity among routine patients who have multiple comorbidities and complex care needs. Our method leverages the recent progress in representation learning of individual ICD-10 codes, and it explicitly uses the sequential order of codes for matching. Empirical evaluation on a state-wide cancer data collection shows that our proposed method achieves significantly higher matching performance compared with state-of-the-art methods ignoring the sequential order. Our method better identifies similar patients in a number of clinical outcomes including readmission and mortality outlook. Although this paper focuses on ICD-10 diagnosis code sequences, our method can be adapted to work with other codified sequence data. },
        FILE = { :nguyen_etal_jms18_effective - Effective Identification of Similar Patients through Sequential Matching Over ICD Code Embedding.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2018.03.29 },
        URL = { https://link.springer.com/article/10.1007/s10916-018-0951-4 },
    }
J
  • Bayesian Multi-Hyperplane Machine for Pattern Recognition
    Khanh Nguyen, Trung Le, Tu Nguyen and Dinh Phung. In Proc. of the 24th International Conference on Pattern Recognition (ICPR), Beijing, China, aug 2018. [ | ]
    Current existing multi-hyperplane machine approach deals with high-dimensional and complex datasets by approximating the input data region using a parametric mixture of hyperplanes. Consequently, this approach requires an excessively time-consuming parameter search to find the set of optimal hyper-parameters. Another serious drawback of this approach is that it is often suboptimal since the optimal choice for the hyper-parameter is likely to lie outside the searching space due to the space discretization step required in grid search. To address these challenges, we propose in this paper BAyesian Multi-hyperplane Machine (BAMM). Our approach departs from a Bayesian perspective, and aims to construct an alternative probabilistic view in such a way that its maximuma-posteriori (MAP) estimation reduces exactly to the original optimization problem of a multi-hyperplane machine. This view allows us to endow prior distributions over hyper-parameters and augment auxiliary variables to efficiently infer model parameters and hyper-parameters via Markov chain Monte Carlo (MCMC) method. We then employ a Stochastic Gradient Descent (SGD) framework to scale our model up with ever-growing large datasets. Extensive experiments demonstrate the capability of our proposed method in learning the optimal model without using any parameter tuning, and in achieving comparable accuracies compared with the state-of-art baselines; in the meantime our model can seamlessly handle with large-scale datasets.
    @INPROCEEDINGS { nguyen_etal_icpr18_bayesian,
        AUTHOR = { Khanh Nguyen and Trung Le and Tu Nguyen and Dinh Phung },
        TITLE = { Bayesian Multi-Hyperplane Machine for Pattern Recognition },
        BOOKTITLE = { Proc. of the 24th International Conference on Pattern Recognition (ICPR) },
        YEAR = { 2018 },
        ADDRESS = { Beijing, China },
        MONTH = { aug },
        ABSTRACT = { Current existing multi-hyperplane machine approach deals with high-dimensional and complex datasets by approximating the input data region using a parametric mixture of hyperplanes. Consequently, this approach requires an excessively time-consuming parameter search to find the set of optimal hyper-parameters. Another serious drawback of this approach is that it is often suboptimal since the optimal choice for the hyper-parameter is likely to lie outside the searching space due to the space discretization step required in grid search. To address these challenges, we propose in this paper BAyesian Multi-hyperplane Machine (BAMM). Our approach departs from a Bayesian perspective, and aims to construct an alternative probabilistic view in such a way that its maximuma-posteriori (MAP) estimation reduces exactly to the original optimization problem of a multi-hyperplane machine. This view allows us to endow prior distributions over hyper-parameters and augment auxiliary variables to efficiently infer model parameters and hyper-parameters via Markov chain Monte Carlo (MCMC) method. We then employ a Stochastic Gradient Descent (SGD) framework to scale our model up with ever-growing large datasets. Extensive experiments demonstrate the capability of our proposed method in learning the optimal model without using any parameter tuning, and in achieving comparable accuracies compared with the state-of-art baselines; in the meantime our model can seamlessly handle with large-scale datasets. },
        FILE = { :nguyen_etal_icpr18_bayesian - Bayesian Multi Hyperplane Machine for Pattern Recognition.pdf:PDF },
    }
C
2017
  • Dual Discriminator Generative Adversarial Nets
    Tu Dinh Nguyen, Trung Le, Hung Vu and Dinh Phung. In Proc. of the 31st Int. Conf. on Neural Information Processing Systems (NIPS), pages 2667-2677, USA, 2017. [ | | pdf]
    We propose in this paper a novel approach to tackle the problem of mode collapse encountered in generative adversarial network (GAN). Our idea is intuitive but proven to be very effective, especially in addressing some key limitations of GAN. In essence, it combines the Kullback-Leibler (KL) and reverse KL divergences into a unified objective function, thus it exploits the complementary statistical properties from these divergences to effectively diversify the estimated density in capturing multi-modes. We term our method dual discriminator generative adversarial nets (D2GAN) which, unlike GAN, has two discriminators; and together with a generator, it also has the analogy of a minimax game, wherein a discriminator rewards high scores for samples from data distribution whilst another discriminator, conversely, favoring data from the generator, and the generator produces data to fool both two discriminators. We develop theoretical analysis to show that, given the maximal discriminators, optimizing the generator of D2GAN reduces to minimizing both KL and reverse KL divergences between data distribution and the distribution induced from the data generated by the generator, hence effectively avoiding the mode collapsing problem. We conduct extensive experiments on synthetic and real-world large-scale datasets (MNIST, CIFAR-10, STL-10, ImageNet), where we have made our best effort to compare our D2GAN with the latest state-of-the-art GAN's variants in comprehensive qualitative and quantitative evaluations. The experimental results demonstrate the competitive and superior performance of our approach in generating good quality and diverse samples over baselines, and the capability of our method to scale up to ImageNet database.
    @INPROCEEDINGS { tu_etal_nips17_d2gan,
        AUTHOR = { Tu Dinh Nguyen and Trung Le and Hung Vu and Dinh Phung },
        TITLE = { Dual Discriminator Generative Adversarial Nets },
        BOOKTITLE = { Proc. of the 31st Int. Conf. on Neural Information Processing Systems (NIPS) },
        YEAR = { 2017 },
        SERIES = { NIPS'17 },
        PAGES = { 2667--2677 },
        ADDRESS = { USA },
        PUBLISHER = { Curran Associates Inc. },
        ABSTRACT = { We propose in this paper a novel approach to tackle the problem of mode collapse encountered in generative adversarial network (GAN). Our idea is intuitive but proven to be very effective, especially in addressing some key limitations of GAN. In essence, it combines the Kullback-Leibler (KL) and reverse KL divergences into a unified objective function, thus it exploits the complementary statistical properties from these divergences to effectively diversify the estimated density in capturing multi-modes. We term our method dual discriminator generative adversarial nets (D2GAN) which, unlike GAN, has two discriminators; and together with a generator, it also has the analogy of a minimax game, wherein a discriminator rewards high scores for samples from data distribution whilst another discriminator, conversely, favoring data from the generator, and the generator produces data to fool both two discriminators. We develop theoretical analysis to show that, given the maximal discriminators, optimizing the generator of D2GAN reduces to minimizing both KL and reverse KL divergences between data distribution and the distribution induced from the data generated by the generator, hence effectively avoiding the mode collapsing problem. We conduct extensive experiments on synthetic and real-world large-scale datasets (MNIST, CIFAR-10, STL-10, ImageNet), where we have made our best effort to compare our D2GAN with the latest state-of-the-art GAN's variants in comprehensive qualitative and quantitative evaluations. The experimental results demonstrate the competitive and superior performance of our approach in generating good quality and diverse samples over baselines, and the capability of our method to scale up to ImageNet database. },
        ACMID = { 3295027 },
        FILE = { :tu_etal_nips17_d2gan - Dual Discriminator Generative Adversarial Nets.pdf:PDF },
        ISBN = { 978-1-5108-6096-4 },
        LOCATION = { Long Beach, California, USA },
        NUMPAGES = { 11 },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2017.09.06 },
        URL = { http://dl.acm.org/citation.cfm?id=3294996.3295027 },
    }
C
  • GoGP: Fast Online Regression with Gaussian Processes
    Trung Le, Khanh Nguyen, Vu Nguyen, Tu Dinh Nguyen and Dinh Phung. In International Conference on Data Mining (ICDM), 2017. [ | ]
    One of the most current challenging problems in Gaussian process regression (GPR) is to handle large-scale datasets and to accommodate an online learning setting where data arrive irregularly on the fly. In this paper, we introduce a novel online Gaussian process model that could scale with massive datasets. Our approach is formulated based on alternative representation of the Gaussian process under geometric and optimization views, hence termed geometric-based online GP (GoGP). We developed theory to guarantee that with a good convergence rate our proposed algorithm always produces a (sparse) solution which is close to the true optima to any arbitrary level of approximation accuracy specified a priori. Furthermore, our method is proven to scale seamlessly not only with large-scale datasets, but also to adapt accurately with streaming data. We extensively evaluated our proposed model against state-of-the-art baselines using several large-scale datasets for online regression task. The experimental results show that our GoGP delivered comparable, or slightly better, predictive performance while achieving a magnitude of computational speedup compared withits rivals under online setting. More importantly, its convergence behavior is guaranteed through our theoretical analysis, which is rapid and stable while achieving lower errors.
    @INPROCEEDINGS { le_etal_icdm17_gogp,
        AUTHOR = { Trung Le and Khanh Nguyen and Vu Nguyen and Tu Dinh Nguyen and Dinh Phung },
        TITLE = { {GoGP}: Fast Online Regression with Gaussian Processes },
        BOOKTITLE = { International Conference on Data Mining (ICDM) },
        YEAR = { 2017 },
        ABSTRACT = { One of the most current challenging problems in Gaussian process regression (GPR) is to handle large-scale datasets and to accommodate an online learning setting where data arrive irregularly on the fly. In this paper, we introduce a novel online Gaussian process model that could scale with massive datasets. Our approach is formulated based on alternative representation of the Gaussian process under geometric and optimization views, hence termed geometric-based online GP (GoGP). We developed theory to guarantee that with a good convergence rate our proposed algorithm always produces a (sparse) solution which is close to the true optima to any arbitrary level of approximation accuracy specified a priori. Furthermore, our method is proven to scale seamlessly not only with large-scale datasets, but also to adapt accurately with streaming data. We extensively evaluated our proposed model against state-of-the-art baselines using several large-scale datasets for online regression task. The experimental results show that our GoGP delivered comparable, or slightly better, predictive performance while achieving a magnitude of computational speedup compared withits rivals under online setting. More importantly, its convergence behavior is guaranteed through our theoretical analysis, which is rapid and stable while achieving lower errors. },
        FILE = { :le_etal_icdm17_gogp - GoGP_ Fast Online Regression with Gaussian Processes.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2017.09.01 },
    }
C
  • Supervised Restricted Boltzmann Machines
    Tu Dinh Nguyen, Dinh Phung, Viet Huynh and Trung Le. In In Proc. of the International Conference on Uncertainty in Artificial Intelligence (UAI), 2017. [ | | pdf]
    We propose in this paper the supervised re-stricted Boltzmann machine (sRBM), a unified framework which combines the versatility of RBM to simultaneously learn the data representation and to perform supervised learning (i.e., a nonlinear classifier or a nonlinear regressor). Unlike the current state-of-the-art classification formulation proposed for RBM in (Larochelle et al., 2012), our model is a hybrid probabilistic graphical model consisting of a distinguished genera-tive component for data representation and a dis-criminative component for prediction. While the work of (Larochelle et al., 2012) typically incurs no extra difficulty in inference compared with a standard RBM, our discriminative component, modeled as a directed graphical model, renders MCMC-based inference (e.g., Gibbs sampler) very slow and unpractical for use. To this end, we further develop scalable variational inference for the proposed sRBM for both classification and regression cases. Extensive experiments on realworld datasets show that our sRBM achieves better predictive performance than baseline methods. At the same time, our proposed framework yields learned representations which are more discriminative, hence interpretable, than those of its counterparts. Besides, our method is probabilistic and capable of generating meaningful data conditioning on specific classes – a topic which is of current great interest in deep learning aiming at data generation.
    @INPROCEEDINGS { nguyen_etal_uai17supervised,
        AUTHOR = { Tu Dinh Nguyen and Dinh Phung and Viet Huynh and Trung Le },
        TITLE = { Supervised Restricted Boltzmann Machines },
        BOOKTITLE = { In Proc. of the International Conference on Uncertainty in Artificial Intelligence (UAI) },
        YEAR = { 2017 },
        ABSTRACT = { We propose in this paper the supervised re-stricted Boltzmann machine (sRBM), a unified framework which combines the versatility of RBM to simultaneously learn the data representation and to perform supervised learning (i.e., a nonlinear classifier or a nonlinear regressor). Unlike the current state-of-the-art classification formulation proposed for RBM in (Larochelle et al., 2012), our model is a hybrid probabilistic graphical model consisting of a distinguished genera-tive component for data representation and a dis-criminative component for prediction. While the work of (Larochelle et al., 2012) typically incurs no extra difficulty in inference compared with a standard RBM, our discriminative component, modeled as a directed graphical model, renders MCMC-based inference (e.g., Gibbs sampler) very slow and unpractical for use. To this end, we further develop scalable variational inference for the proposed sRBM for both classification and regression cases. Extensive experiments on realworld datasets show that our sRBM achieves better predictive performance than baseline methods. At the same time, our proposed framework yields learned representations which are more discriminative, hence interpretable, than those of its counterparts. Besides, our method is probabilistic and capable of generating meaningful data conditioning on specific classes – a topic which is of current great interest in deep learning aiming at data generation. },
        FILE = { :nguyen_etal_uai17supervised - Supervised Restricted Boltzmann Machines.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2017.08.29 },
        URL = { http://auai.org/uai2017/proceedings/papers/106.pdf },
    }
C
  • Multilevel clustering via Wasserstein means
    Nhat Ho, XuanLong Nguyen, Mikhail Yurochkin, Hung Bui, Viet Huynh and Dinh Phung. In Proc. of the 34th Internaltional Conference on Machine Learning (ICML), pages 1501-1509, 2017. [ | | pdf]
    We propose a novel approach to the problem of multilevel clustering, which aims to simultaneously partition data in each group and discover grouping patterns among groups in a large hierarchically structural corpus of data. Our method involves a joint optimization formulation over several spaces of discrete probability measures, which are endowed with the Wasserstein distance metric. We propose a number of variants of this problem, which admit fast optimization algorithms, by exploiting the connection to the problem of finding Wasserstein barycenters. We also establish consistency properties enjoyed by our estimates of both local and global clusters. Finally, we present experiment results with both synthetic and real data to demonstrate the flexibility and scalability of the proposed approach.
    @INPROCEEDINGS { ho_etal_icml17multilevel,
        AUTHOR = { Nhat Ho and XuanLong Nguyen and Mikhail Yurochkin and Hung Bui and Viet Huynh and Dinh Phung },
        TITLE = { Multilevel clustering via {W}asserstein means },
        BOOKTITLE = { Proc. of the 34th Internaltional Conference on Machine Learning (ICML) },
        YEAR = { 2017 },
        VOLUME = { 70 },
        SERIES = { ICML'17 },
        PAGES = { 1501--1509 },
        PUBLISHER = { JMLR.org },
        ABSTRACT = { We propose a novel approach to the problem of multilevel clustering, which aims to simultaneously partition data in each group and discover grouping patterns among groups in a large hierarchically structural corpus of data. Our method involves a joint optimization formulation over several spaces of discrete probability measures, which are endowed with the Wasserstein distance metric. We propose a number of variants of this problem, which admit fast optimization algorithms, by exploiting the connection to the problem of finding Wasserstein barycenters. We also establish consistency properties enjoyed by our estimates of both local and global clusters. Finally, we present experiment results with both synthetic and real data to demonstrate the flexibility and scalability of the proposed approach. },
        ACMID = { 3305536 },
        FILE = { :ho_etal_icml17multilevel - Multilevel Clustering Via Wasserstein Means.pdf:PDF },
        LOCATION = { Sydney, NSW, Australia },
        NUMPAGES = { 9 },
        URL = { http://dl.acm.org/citation.cfm?id=3305381.3305536 },
    }
C
  • Approximation Vector Machines for Large-scale Online Learning
    Trung Le, Tu Dinh Nguyen, Vu Nguyen and Dinh Q. Phung. Journal of Machine Learning Research (JMLR), 2017. [ | | pdf]
    One of the most challenging problems in kernel online learning is to bound the model size and to promote the model sparsity. Sparse models not only improve computation and memory usage, but also enhance the generalization capacity, a principle that concurs with the law of parsimony. However, inappropriate sparsity modeling may also significantly degrade the performance. In this paper, we propose Approximation Vector Machine (AVM), a model that can simultaneously encourage the sparsity and safeguard its risk in compromising the performance. When an incoming instance arrives, we approximate this instance by one of its neighbors whose distance to it is less than a predefined threshold. Our key intuition is that since the newly seen instance is expressed by its nearby neighbor the optimal performance can be analytically formulated and maintained. We develop theoretical foundations to support this intuition and further establish an analysis to characterize the gap between the approximation and optimal solutions. This gap crucially depends on the frequency of approximation and the predefined threshold. We perform the convergence analysis for a wide spectrum of loss functions including Hinge, smooth Hinge, and Logistic for classification task, and l1, l2, and ϵ-insensitive for regression task. We conducted extensive experiments for classification task in batch and online modes, and regression task in online mode over several benchmark datasets. The results show that our proposed AVM achieved a comparable predictive performance with current state-of-the-art methods while simultaneously achieving significant computational speed-up due to the ability of the proposed AVM in maintaining the model size.
    @ARTICLE { le_etal_jmlr17approximation,
        AUTHOR = { Trung Le and Tu Dinh Nguyen and Vu Nguyen and Dinh Q. Phung },
        TITLE = { Approximation Vector Machines for Large-scale Online Learning },
        JOURNAL = { Journal of Machine Learning Research (JMLR) },
        YEAR = { 2017 },
        ABSTRACT = { One of the most challenging problems in kernel online learning is to bound the model size and to promote the model sparsity. Sparse models not only improve computation and memory usage, but also enhance the generalization capacity, a principle that concurs with the law of parsimony. However, inappropriate sparsity modeling may also significantly degrade the performance. In this paper, we propose Approximation Vector Machine (AVM), a model that can simultaneously encourage the sparsity and safeguard its risk in compromising the performance. When an incoming instance arrives, we approximate this instance by one of its neighbors whose distance to it is less than a predefined threshold. Our key intuition is that since the newly seen instance is expressed by its nearby neighbor the optimal performance can be analytically formulated and maintained. We develop theoretical foundations to support this intuition and further establish an analysis to characterize the gap between the approximation and optimal solutions. This gap crucially depends on the frequency of approximation and the predefined threshold. We perform the convergence analysis for a wide spectrum of loss functions including Hinge, smooth Hinge, and Logistic for classification task, and l1, l2, and ϵ-insensitive for regression task. We conducted extensive experiments for classification task in batch and online modes, and regression task in online mode over several benchmark datasets. The results show that our proposed AVM achieved a comparable predictive performance with current state-of-the-art methods while simultaneously achieving significant computational speed-up due to the ability of the proposed AVM in maintaining the model size. },
        FILE = { :le_etal_jmlr17approximation - Approximation Vector Machines for Large Scale Online Learning.pdf:PDF },
        KEYWORDS = { kernel, online learning, large-scale machine learning, sparsity, big data, core set, stochastic gradient descent, convergence analysis },
        URL = { https://arxiv.org/abs/1604.06518 },
    }
J
  • Discriminative Bayesian Nonparametric Clustering
    Vu Nguyen, Dinh Phung, Trung Le, Svetha Venkatesh and Hung Bui. In Proc. of International Joint Conference on Artificial Intelligence (IJCAI), 2017. [ | | pdf]
    We propose a general framework for discriminative Bayesian nonparametric clustering to promote the inter-discrimination among the learned clusters in a fully Bayesian nonparametric (BNP) manner. Our method combines existing BNP clustering and discriminative models by enforcing latent cluster indices to be consistent with the predicted labels resulted from probabilistic discriminative model. This formulation results in a well-defined generative process wherein we can use either logistic regression or SVM for discrimination. Using the proposed framework, we develop two novel discriminative BNP variants: the discriminative Dirichlet process mixtures, and the discriminative-state infinite HMMs for sequential data. We develop efficient data-augmentation Gibbs samplers for posterior inference. Extensive experiments in image clustering and dynamic location clustering demonstrate that by encouraging discrimination between induced clusters, our model enhances the quality of clustering in comparison with the traditional generative BNP models.
    @INPROCEEDINGS { nguyen_etal_ijcai17discriminative,
        AUTHOR = { Vu Nguyen and Dinh Phung and Trung Le and Svetha Venkatesh and Hung Bui },
        TITLE = { Discriminative Bayesian Nonparametric Clustering },
        BOOKTITLE = { Proc. of International Joint Conference on Artificial Intelligence (IJCAI) },
        YEAR = { 2017 },
        ABSTRACT = { We propose a general framework for discriminative Bayesian nonparametric clustering to promote the inter-discrimination among the learned clusters in a fully Bayesian nonparametric (BNP) manner. Our method combines existing BNP clustering and discriminative models by enforcing latent cluster indices to be consistent with the predicted labels resulted from probabilistic discriminative model. This formulation results in a well-defined generative process wherein we can use either logistic regression or SVM for discrimination. Using the proposed framework, we develop two novel discriminative BNP variants: the discriminative Dirichlet process mixtures, and the discriminative-state infinite HMMs for sequential data. We develop efficient data-augmentation Gibbs samplers for posterior inference. Extensive experiments in image clustering and dynamic location clustering demonstrate that by encouraging discrimination between induced clusters, our model enhances the quality of clustering in comparison with the traditional generative BNP models. },
        FILE = { :nguyen_etal_ijcai17discriminative - Discriminative Bayesian Nonparametric Clustering.pdf:PDF },
        URL = { https://www.ijcai.org/proceedings/2017/355 },
    }
C
  • Large-scale Online Kernel Learning with Random Feature Reparameterization
    Tu Dinh Nguyen, Trung Le, Hung Bui and Dinh Phung. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI), 2017. [ | | pdf]
    A typical online kernel learning method faces two fundamental issues: the complexity in dealing with a huge number of observed data points (a.k.a the curse of kernelization) and the difficulty in learning kernel parameters, which often assumed to be fixed. Random Fourier feature is a recent and effective approach to address the former by approximating the shift-invariant kernel function via Bocher’s theorem, and allows the model to be maintained directly in the random feature space with a fixed dimension, hence the model size remains constant w.r.t. data size. We further introduce in this paper the reparameterized random feature (RRF), a random feature framework for large-scale online kernel learning to address both aforementioned challenges. Our initial intuition comes from the so-called ‘reparameterization trick’ [Kingma and Welling, 2014] to lift the source of randomness of Fourier components to another space which can be independently sampled, so that stochastic gradient of the kernel parameters can be analytically derived. We develop a well-founded underlying theory for our method, including a general way to reparameterize the kernel, and a new tighter error bound on the approximation quality. This view further inspires a direct application of stochastic gradient descent for updating our model under an online learning setting. We then conducted extensive experiments on several large-scale datasets where we demonstrate that our work achieves state-of-the-art performance in both learning efficacy and efficiency.
    @INPROCEEDINGS { tu_etal_ijcai17_rrf,
        AUTHOR = { Tu Dinh Nguyen and Trung Le and Hung Bui and Dinh Phung },
        BOOKTITLE = { Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI) },
        TITLE = { Large-scale Online Kernel Learning with Random Feature Reparameterization },
        YEAR = { 2017 },
        SERIES = { IJCAI'17 },
        ABSTRACT = { A typical online kernel learning method faces two fundamental issues: the complexity in dealing with a huge number of observed data points (a.k.a the curse of kernelization) and the difficulty in learning kernel parameters, which often assumed to be fixed. Random Fourier feature is a recent and effective approach to address the former by approximating the shift-invariant kernel function via Bocher’s theorem, and allows the model to be maintained directly in the random feature space with a fixed dimension, hence the model size remains constant w.r.t. data size. We further introduce in this paper the reparameterized random feature (RRF), a random feature framework for large-scale online kernel learning to address both aforementioned challenges. Our initial intuition comes from the so-called ‘reparameterization trick’ [Kingma and Welling, 2014] to lift the source of randomness of Fourier components to another space which can be independently sampled, so that stochastic gradient of the kernel parameters can be analytically derived. We develop a well-founded underlying theory for our method, including a general way to reparameterize the kernel, and a new tighter error bound on the approximation quality. This view further inspires a direct application of stochastic gradient descent for updating our model under an online learning setting. We then conducted extensive experiments on several large-scale datasets where we demonstrate that our work achieves state-of-the-art performance in both learning efficacy and efficiency. },
        FILE = { :tu_etal_ijcai17_rrf - Large Scale Online Kernel Learning with Random Feature Reparameterization.pdf:PDF },
        LOCATION = { Melbourne, Australia },
        NUMPAGES = { 7 },
        URL = { https://www.ijcai.org/proceedings/2017/354 },
    }
C
  • Column Networks for Collective Classification
    Pham, Trang, Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. In The Thirty-First AAAI Conference on Artificial Intelligence (AAAI), 2017. [ | | pdf]
    Relational learning deals with data that are characterized by relational structures. An important task is collective classification, which is to jointly classify networked objects. While it holds a great promise to produce a better accuracy than non-collective classifiers, collective classification is computational challenging and has not leveraged on the recent breakthroughs of deep learning. We present Column Network (CLN), a novel deep learning model for collective classification in multi-relational domains. CLN has many desirable theoretical properties: (i) it encodes multi-relations between any two instances; (ii) it is deep and compact, allowing complex functions to be approximated at the network level with a small set of free parameters; (iii) local and relational features are learned simultaneously; (iv) long-range, higher-order dependencies between instances are supported naturally; and (v) crucially, learning and inference are efficient, linear in the size of the network and the number of relations. We evaluate CLN on multiple real-world applications: (a) delay prediction in software projects, (b) PubMed Diabetes publication classification and (c) film genre classification. In all applications, CLN demonstrates a higher accuracy than state-of-the-art rivals.
    @CONFERENCE { pham_etal_aaai17column,
        AUTHOR = { Pham, Trang and Tran, Truyen and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { Column Networks for Collective Classification },
        BOOKTITLE = { The Thirty-First AAAI Conference on Artificial Intelligence (AAAI) },
        YEAR = { 2017 },
        ABSTRACT = { Relational learning deals with data that are characterized by relational structures. An important task is collective classification, which is to jointly classify networked objects. While it holds a great promise to produce a better accuracy than non-collective classifiers, collective classification is computational challenging and has not leveraged on the recent breakthroughs of deep learning. We present Column Network (CLN), a novel deep learning model for collective classification in multi-relational domains. CLN has many desirable theoretical properties: (i) it encodes multi-relations between any two instances; (ii) it is deep and compact, allowing complex functions to be approximated at the network level with a small set of free parameters; (iii) local and relational features are learned simultaneously; (iv) long-range, higher-order dependencies between instances are supported naturally; and (v) crucially, learning and inference are efficient, linear in the size of the network and the number of relations. We evaluate CLN on multiple real-world applications: (a) delay prediction in software projects, (b) PubMed Diabetes publication classification and (c) film genre classification. In all applications, CLN demonstrates a higher accuracy than state-of-the-art rivals. },
        COMMENT = { Accepted },
        FILE = { :pham_etal_aaai17column - Column Networks for Collective Classification.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.11.14 },
        URL = { https://arxiv.org/abs/1609.04508 },
    }
C
  • Forward-Backward Smoothing for Hidden Markov Models of Point Pattern Data
    Nhan Dam, Dinh Phung, Ba-Ngu Vo and Viet Huynh. In 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pages 252-261, Tokyo, Japan, October 2017. [ | ]
    @INPROCEEDINGS { dam_etal_dsaa17forward,
        TITLE = { Forward-Backward Smoothing for Hidden {M}arkov Models of Point Pattern Data },
        AUTHOR = { Nhan Dam and Dinh Phung and Ba-Ngu Vo and Viet Huynh },
        BOOKTITLE = { 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA) },
        MONTH = { October },
        YEAR = { 2017 },
        PAGES = { 252-261 },
        ADDRESS = { Tokyo, Japan },
        FILE = { :dam_etal_dsaa17forward - Forward Backward Smoothing for Hidden Markov Models of Point Pattern Data.pdf:PDF },
        OWNER = { ndam },
        TIMESTAMP = { 2017.08.28 },
    }
C
  • Animal Recognition and Identification with Deep Convolutional Neural Networks for Automated Wildlife Monitoring
    Hung Nguyen, Sarah J. Maclagan, Tu Dinh Nguyen, Thin Nguyen, Paul Flemons, Kylie Andrews, Euan G. Ritchie and Dinh Phung. In Proceedings of the IEEE International Conference on Data Science and Advanced Analytics (DSAA), 2017. (Honorable Mention Application Paper). [ | ]
    Efficient and reliable monitoring of wild animals in their natural habitats is essential to inform conservation and management decisions. Automatic covert cameras or “camera traps” are being an increasingly popular tool for wildlife monitoring due to their effectiveness and reliability in collecting data of wildlife unobtrusively, continuously and in large volume. However, processing such a large volume of images and videos captured from camera traps manually is extremely expensive, time-consuming and also monotonous. This presents a major obstacle to scientists and ecologists to monitor wildlife in an open environment. Leveraging on recent advances in deep learning techniques in computer vision, we propose in this paper a framework to build automated animal recognition in the wild, aiming at an automated wildlife monitoring system. In particular, we use a single-labeled dataset from Wildlife Spotter project, done by citizen scientists, and the state-of-the-art deep convolutional neural network architectures, to train a computational system capable of filtering animal images and identifying species automatically. Our experimental results achieved an accuracy at 96.6% for the task of detecting images containing animal, and 90.4% for identifying the three most common species among the set of images of wild animals taken in South-central Victoria, Australia, demonstrating the feasibility of building fully automated wildlife observation. This, in turn, can therefore speed up research findings, construct more efficient citizen sciencebased monitoring systems and subsequent management decisions, having the potential to make significant impacts to the world of ecology and trap camera images analysis.
    @INPROCEEDINGS { hung_etal_dsaa17animal,
        AUTHOR = { Hung Nguyen and Sarah J. Maclagan and Tu Dinh Nguyen and Thin Nguyen and Paul Flemons and Kylie Andrews and Euan G. Ritchie and Dinh Phung },
        TITLE = { Animal Recognition and Identification with Deep Convolutional Neural Networks for Automated Wildlife Monitoring },
        BOOKTITLE = { Proceedings of the IEEE International Conference on Data Science and Advanced Analytics (DSAA) },
        YEAR = { 2017 },
        NOTE = { Honorable Mention Application Paper },
        ABSTRACT = { Efficient and reliable monitoring of wild animals in their natural habitats is essential to inform conservation and management decisions. Automatic covert cameras or “camera traps” are being an increasingly popular tool for wildlife monitoring due to their effectiveness and reliability in collecting data of wildlife unobtrusively, continuously and in large volume. However, processing such a large volume of images and videos captured from camera traps manually is extremely expensive, time-consuming and also monotonous. This presents a major obstacle to scientists and ecologists to monitor wildlife in an open environment. Leveraging on recent advances in deep learning techniques in computer vision, we propose in this paper a framework to build automated animal recognition in the wild, aiming at an automated wildlife monitoring system. In particular, we use a single-labeled dataset from Wildlife Spotter project, done by citizen scientists, and the state-of-the-art deep convolutional neural network architectures, to train a computational system capable of filtering animal images and identifying species automatically. Our experimental results achieved an accuracy at 96.6% for the task of detecting images containing animal, and 90.4% for identifying the three most common species among the set of images of wild animals taken in South-central Victoria, Australia, demonstrating the feasibility of building fully automated wildlife observation. This, in turn, can therefore speed up research findings, construct more efficient citizen sciencebased monitoring systems and subsequent management decisions, having the potential to make significant impacts to the world of ecology and trap camera images analysis. },
        FILE = { :hung_etal_dsaa17animal - Animal Recognition and Identification with Deep Convolutional Neural Networks for Automated Wildlife Monitoring.pdf:PDF },
        OWNER = { hung },
        TIMESTAMP = { 2017.08.28 },
    }
C
  • Prediction of Population Health Indices from Social Media using Kernel-based Textual and Temporal Features
    Nguyen, Thin, Nguyen, Duc Thanh, Larsen, Mark E., O'Dea, Bridianne, Yearwood, John, Phung, Dinh, Venkatesh, Svetha and Christensen, Helen. In Proceedings of the International Conference on World Wide Web (WWW), 2017. [ | | pdf]
    From 1984, the US has annually conducted the Behavioral Risk Factor Surveillance System (BRFSS) surveys to capture either health behaviors, such as drinking or smoking, or health outcomes, including mental, physical, and generic health, of the population. Although this kind of information at a population level, such as US counties, is important for local governments to identify local needs, traditional datasets may take years to collate and to become publicly available. Geocoded social media data can provide an alternative reflection of local health trends. In this work, to predict the percentage of adults in a county reporting “insufficient sleep”, a health behavior, and, at the same time, their health outcomes, novel textual and temporal features are proposed. The proposed textual features are defined at mid-level and can be applied on top of various low-level textual features. They are computed via kernel functions on underlying features and encode the relationships between individual underlying features over a population. To further enrich the predictive ability of the health indices, the textual features are augmented with temporal information. We evaluated the proposed features and compared them with existing features using a dataset collected from the BRFSS. Experimental results show that the combination of kernel-based textual features and temporal information predict well both the health behavior (with best performance at rho=0.82) and health outcomes (with best performance at rho=0.78), demonstrating the capability of social media data in prediction of population health indices. The results also show that our proposed features gained higher correlation coefficients than did the existing ones, increasing the correlation coefficient by up to 0.16, suggesting the potential of the approach in a wide spectrum of applications on data analytics at population levels.
    @INPROCEEDINGS { nguyen_etal_www17prediction,
        AUTHOR = { Nguyen, Thin and Nguyen, Duc Thanh and Larsen, Mark E. and O'Dea, Bridianne and Yearwood, John and Phung, Dinh and Venkatesh, Svetha and Christensen, Helen },
        TITLE = { Prediction of Population Health Indices from Social Media using Kernel-based Textual and Temporal Features },
        BOOKTITLE = { Proceedings of the International Conference on World Wide Web (WWW) },
        YEAR = { 2017 },
        ABSTRACT = { From 1984, the US has annually conducted the Behavioral Risk Factor Surveillance System (BRFSS) surveys to capture either health behaviors, such as drinking or smoking, or health outcomes, including mental, physical, and generic health, of the population. Although this kind of information at a population level, such as US counties, is important for local governments to identify local needs, traditional datasets may take years to collate and to become publicly available. Geocoded social media data can provide an alternative reflection of local health trends. In this work, to predict the percentage of adults in a county reporting “insufficient sleep”, a health behavior, and, at the same time, their health outcomes, novel textual and temporal features are proposed. The proposed textual features are defined at mid-level and can be applied on top of various low-level textual features. They are computed via kernel functions on underlying features and encode the relationships between individual underlying features over a population. To further enrich the predictive ability of the health indices, the textual features are augmented with temporal information. We evaluated the proposed features and compared them with existing features using a dataset collected from the BRFSS. Experimental results show that the combination of kernel-based textual features and temporal information predict well both the health behavior (with best performance at rho=0.82) and health outcomes (with best performance at rho=0.78), demonstrating the capability of social media data in prediction of population health indices. The results also show that our proposed features gained higher correlation coefficients than did the existing ones, increasing the correlation coefficient by up to 0.16, suggesting the potential of the approach in a wide spectrum of applications on data analytics at population levels. },
        FILE = { :nguyen_etal_www17prediction - Prediction of Population Health Indices from Social Media Using Kernel Based Textual and Temporal Features.pdf:PDF },
        OWNER = { thinng },
        TIMESTAMP = { 2017.03.25 },
        URL = { http://dl.acm.org/citation.cfm?id=3054136 },
    }
C
  • Latent Sentiment Topic Modelling and Nonparametric Discovery of Online Mental Health-related Communities
    Bo Dao, Thin Nguyen, Svetha Venkatesh and Dinh Phung. International Journal of Data Science and Analytics, 4:209–-231, November 2017. [ | | pdf]
    Social media are an online means of interaction among individuals. People are increasingly using social media, especially online communities, to discuss health concerns and seek support. Understanding topics, sentiment, and structures of these communities informs important aspects of health-related conditions. There has been growing research interest in analyzing online mental health communities; however analysis of these communities with health concerns has been limited. This paper investigate and identify latent meta-groups of online communities with and without mental health-related conditions including depression and autism. Large datasets from online communities were crawled. We analyse both sentiment-based, psycholinguistic-based and topic-based features from blog posts made by members of these online communities. The work focuses on using nonparametric methods to infer latent topics automatically from the corpus of affective words in the blog posts. The visualization of the discovered meta-communities in their use of latent topics shows a difference between the groups. This presents evidence of the emotion-bearing difference in online mental health-related communities, suggesting a possible angle for support and intervention. The methodology might offer potential machine learning techniques for research and practice in psychiatry.
    @ARTICLE { Dao_etal_17Latent,
        AUTHOR = { Bo Dao and Thin Nguyen and Svetha Venkatesh and Dinh Phung },
        TITLE = { Latent Sentiment Topic Modelling and Nonparametric Discovery of Online Mental Health-related Communities },
        JOURNAL = { International Journal of Data Science and Analytics },
        YEAR = { 2017 },
        VOLUME = { 4 },
        PAGES = { 209–-231 },
        MONTH = { November },
        ABSTRACT = { Social media are an online means of interaction among individuals. People are increasingly using social media, especially online communities, to discuss health concerns and seek support. Understanding topics, sentiment, and structures of these communities informs important aspects of health-related conditions. There has been growing research interest in analyzing online mental health communities; however analysis of these communities with health concerns has been limited. This paper investigate and identify latent meta-groups of online communities with and without mental health-related conditions including depression and autism. Large datasets from online communities were crawled. We analyse both sentiment-based, psycholinguistic-based and topic-based features from blog posts made by members of these online communities. The work focuses on using nonparametric methods to infer latent topics automatically from the corpus of affective words in the blog posts. The visualization of the discovered meta-communities in their use of latent topics shows a difference between the groups. This presents evidence of the emotion-bearing difference in online mental health-related communities, suggesting a possible angle for support and intervention. The methodology might offer potential machine learning techniques for research and practice in psychiatry. },
        FILE = { :Dao_etal_17Latent - Latent Sentiment Topic Modelling and Nonparametric Discovery of Online Mental Health Related Communities.pdf:PDF },
        OWNER = { thinng },
        TIMESTAMP = { 2017.08.31 },
        URL = { https://link.springer.com/article/10.1007/s41060-017-0073-y },
    }
J

Invalid BibTex Entry!

  • Estimating support scores of autism communities in large-scale Web information systems
    Thin Nguyen, Hung Nguyen, Svetha Venkatesh and Dinh Phung. In Proceedings of the International Conference on Web Information Systems Engineering (WISE)Springer, , 2017. [ | ]
    Individuals with Autism Spectrum Disorder (ASD) have been shown to prefer communication at a socio-spatial distance. So while rarely found in the real world, autism communities are popular in Web-based forums, convenient for people with ASD to seek and share health related information. Reddit is one such avenue for people of common interest to connect, forming communities of specific interest, namely subreddits. This work aims to estimate support scores provided by a popular subreddit interested in ASD – www.reddit.com/r/aspergers. The scores were measured in both the quantities and qualities of the conversations in the forum, including conversational involvement, emotional, and informational support. The support scores of the subreddit Aspergers was compared with that of an average subreddit derived from entire Reddit, represented by two big corpora of approximately 200 million Reddit posts and 1.66 billion Reddit comments. The ASD subreddit was found to be a supportive community, having far higher support scores than did the average subreddit. Apache Spark, an advanced cluster computing framework, is employed to speed up processing of the large corpora. Scalable machine learning techniques implemented in Spark help discriminate the content made in Aspergers versus other subreddits and automatically discover linguistic predictors of ASD within minutes, providing timely reports.
    @INCOLLECTION { Nguyen_etal_17Estimating,
        AUTHOR = { Thin Nguyen and Hung Nguyen and Svetha Venkatesh and Dinh Phung },
        TITLE = { Estimating support scores of autism communities in large-scale Web information systems },
        BOOKTITLE = { Proceedings of the International Conference on Web Information Systems Engineering (WISE) },
        PUBLISHER = { Springer },
        YEAR = { 2017 },
        SERIES = { Lecture Notes in Computer Science },
        ABSTRACT = { Individuals with Autism Spectrum Disorder (ASD) have been shown to prefer communication at a socio-spatial distance. So while rarely found in the real world, autism communities are popular in Web-based forums, convenient for people with ASD to seek and share health related information. Reddit is one such avenue for people of common interest to connect, forming communities of specific interest, namely subreddits. This work aims to estimate support scores provided by a popular subreddit interested in ASD – www.reddit.com/r/aspergers. The scores were measured in both the quantities and qualities of the conversations in the forum, including conversational involvement, emotional, and informational support. The support scores of the subreddit Aspergers was compared with that of an average subreddit derived from entire Reddit, represented by two big corpora of approximately 200 million Reddit posts and 1.66 billion Reddit comments. The ASD subreddit was found to be a supportive community, having far higher support scores than did the average subreddit. Apache Spark, an advanced cluster computing framework, is employed to speed up processing of the large corpora. Scalable machine learning techniques implemented in Spark help discriminate the content made in Aspergers versus other subreddits and automatically discover linguistic predictors of ASD within minutes, providing timely reports. },
        FILE = { :Nguyen_etal_17Estimating - Estimating Support Scores of Autism Communities in Large Scale Web Information Systems.pdf:PDF },
        LANGUAGE = { English },
        OWNER = { thinng },
        TIMESTAMP = { 2017.08.28 },
    }
BC
  • Kernel-based features for predicting population health indices from geocoded social media data
    Thin Nguyen, Mark E. Larsen, Bridianne O'Dea, Duc Thanh Nguyen, John Yearwood, Dinh Phung, Svetha Venkatesh and Helen Christensen. Decision Support Systems, 2017. [ | | pdf]
    When using tweets to predict population health index, due to the large scale of data, an aggregation of tweets by population has been a popular practice in learning features to characterize the population. This would alleviate the computational cost for extracting features on each individual tweet. On the other hand, much information on the population could be lost as the distribution of textual features of a population could be important for identifying the health index of that population. In addition, there could be relationships between features and those relationships could also convey predictive information of the health index. In this paper, we propose mid-level features namely kernel-based features for prediction of health indices of populations from social media data. The kernel-based features are extracted on the distributions of textual features over population tweets and encode the relationships between individual textual features in a kernel function. We implemented our features using three different kernel functions and applied them for two case studies of population health prediction: across-year prediction and across-county prediction. The kernel-based features were evaluated and compared with existing features on a dataset collected from the Behavioral Risk Factor Surveillance System dataset. Experimental results show that the kernel-based features gained significantly higher prediction performance than existing techniques, by up to 16.3%, suggesting the potential and applicability of the proposed features in a wide spectrum of applications on data analytics at population levels.
    @ARTICLE { Nguyen_etal_17Kernel,
        AUTHOR = { Thin Nguyen and Mark E. Larsen and Bridianne O'Dea and Duc Thanh Nguyen and John Yearwood and Dinh Phung and Svetha Venkatesh and Helen Christensen },
        TITLE = { Kernel-based features for predicting population health indices from geocoded social media data },
        JOURNAL = { Decision Support Systems },
        YEAR = { 2017 },
        VOLUME = { 0 },
        NUMBER = { 0 },
        PAGES = { 1-34 },
        ABSTRACT = { When using tweets to predict population health index, due to the large scale of data, an aggregation of tweets by population has been a popular practice in learning features to characterize the population. This would alleviate the computational cost for extracting features on each individual tweet. On the other hand, much information on the population could be lost as the distribution of textual features of a population could be important for identifying the health index of that population. In addition, there could be relationships between features and those relationships could also convey predictive information of the health index. In this paper, we propose mid-level features namely kernel-based features for prediction of health indices of populations from social media data. The kernel-based features are extracted on the distributions of textual features over population tweets and encode the relationships between individual textual features in a kernel function. We implemented our features using three different kernel functions and applied them for two case studies of population health prediction: across-year prediction and across-county prediction. The kernel-based features were evaluated and compared with existing features on a dataset collected from the Behavioral Risk Factor Surveillance System dataset. Experimental results show that the kernel-based features gained significantly higher prediction performance than existing techniques, by up to 16.3%, suggesting the potential and applicability of the proposed features in a wide spectrum of applications on data analytics at population levels. },
        FILE = { :Nguyen_etal_17Kernel - Kernel Based Features for Predicting Population Health Indices from Geocoded Social Media Data.pdf:PDF },
        OWNER = { thinng },
        TIMESTAMP = { 2017.07.01 },
        URL = { http://www.sciencedirect.com/science/article/pii/S0167923617301227 },
    }
J
  • Estimation of the prevalence of adverse drug reactions from social media
    Thin Nguyen, Mark Larsen, Bridianne O'Dea, Dinh Phung, Svetha Venkatesh and Helen Christensen. International Journal of Medical Informatics (IJMI), 2017. [ | | pdf]
    This work aims to estimate the degree of adverse drug reactions (ADR) for psychiatric medications from social media, including Twitter, Reddit, and LiveJournal. Advances in lightning-fast cluster computing was employed to process large scale data, consisting of 6.4 terabytes of data containing 3.8 billion records from all the media. Rates of ADR were quantified using the SIDER database of drugs and side-effects, and an estimated ADR rate was based on the prevalence of discussion in the social media corpora. Agreement between these measures for a sample of ten popular psychiatric drugs was evaluated using the Pearson correlation coefficient, r, with values between 0.08 and 0.50. Word2vec, a novel neural learning framework, was utilized to improve the coverage of variants of ADR terms in the unstructured text by identifying syntactically or semantically similar terms. Improved correlation coefficients, between 0.29 and 0.59, demonstrates the capability of advanced techniques in machine learning to aid in the discovery of meaningful patterns from medical data, and social media data, at scale.
    @ARTICLE { nguyen_etal_jmi17estimation,
        AUTHOR = { Thin Nguyen and Mark Larsen and Bridianne O'Dea and Dinh Phung and Svetha Venkatesh and Helen Christensen },
        TITLE = { Estimation of the prevalence of adverse drug reactions from social media },
        JOURNAL = { International Journal of Medical Informatics (IJMI) },
        YEAR = { 2017 },
        PAGES = { 1--17 },
        ABSTRACT = { This work aims to estimate the degree of adverse drug reactions (ADR) for psychiatric medications from social media, including Twitter, Reddit, and LiveJournal. Advances in lightning-fast cluster computing was employed to process large scale data, consisting of 6.4 terabytes of data containing 3.8 billion records from all the media. Rates of ADR were quantified using the SIDER database of drugs and side-effects, and an estimated ADR rate was based on the prevalence of discussion in the social media corpora. Agreement between these measures for a sample of ten popular psychiatric drugs was evaluated using the Pearson correlation coefficient, r, with values between 0.08 and 0.50. Word2vec, a novel neural learning framework, was utilized to improve the coverage of variants of ADR terms in the unstructured text by identifying syntactically or semantically similar terms. Improved correlation coefficients, between 0.29 and 0.59, demonstrates the capability of advanced techniques in machine learning to aid in the discovery of meaningful patterns from medical data, and social media data, at scale. },
        FILE = { :nguyen_etal_jmi17estimation - Estimation of the Prevalence of Adverse Drug Reactions from Social Media.pdf:PDF },
        URL = { http://www.sciencedirect.com/science/article/pii/S1386505617300746 },
    }
J

Invalid BibTex Entry!

  • Hierarchical semi-Markov conditional random fields for deep recursive sequential data
    Truyen Tran, Dinh Phung, Hung H. Bui and Svetha Venkatesh. Artificial Intelligence (AIJ), Feb. 2017. [ | | pdf]
    We present the hierarchical semi-Markov conditional random field (HSCRF), a generalisation of linear-chain conditional random fields to model deep nested Markov processes. It is parameterised in a discriminative framework and has polynomial time algorithms for learning and inference. Importantly, we consider partially-supervised learning and propose algorithms for generalised partially-supervised learning and constrained inference. We develop numerical scaling procedures that handle the overflow problem. We show that the HSCRF can be reduced to the semi-Markov conditional random fields. Finally, we demonstrate the HSCRF in two applications: (i) recognising human activities of daily living (ADLs) from indoor surveillance cameras, and (ii) noun-phrase chunking. The HSCRF is capable of learning rich hierarchical models with reasonable accuracy in both fully and partially observed data cases.
    @ARTICLE { tran_etal_aij17hierarchical,
        AUTHOR = { Truyen Tran and Dinh Phung and Hung H. Bui and Svetha Venkatesh },
        TITLE = { Hierarchical semi-Markov conditional random fields for deep recursive sequential data },
        JOURNAL = { Artificial Intelligence (AIJ) },
        YEAR = { 2017 },
        MONTH = { Feb. },
        ABSTRACT = { We present the hierarchical semi-Markov conditional random field (HSCRF), a generalisation of linear-chain conditional random fields to model deep nested Markov processes. It is parameterised in a discriminative framework and has polynomial time algorithms for learning and inference. Importantly, we consider partially-supervised learning and propose algorithms for generalised partially-supervised learning and constrained inference. We develop numerical scaling procedures that handle the overflow problem. We show that the HSCRF can be reduced to the semi-Markov conditional random fields. Finally, we demonstrate the HSCRF in two applications: (i) recognising human activities of daily living (ADLs) from indoor surveillance cameras, and (ii) noun-phrase chunking. The HSCRF is capable of learning rich hierarchical models with reasonable accuracy in both fully and partially observed data cases. },
        FILE = { :tran_etal_aij17hierarchical - Hierarchical Semi Markov Conditional Random Fields for Deep Recursive Sequential Data.pdf:PDF },
        KEYWORDS = { Deep nested sequential processes, Hierarchical semi-Markov conditional random field, Partial labelling, Constrained inference, Numerical scaling },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2017.02.21 },
        URL = { http://www.sciencedirect.com/science/article/pii/S0004370217300231 },
    }
J
  • See my thesis (chapter 5) for for an equivalent directed graphical model, which is the precusor of this work and where I had described the Assymetric Inside-Outside (AIO) algorithm in great detail. A brief version of this for directed case has also appeared in this AAAI'04's paper. The idea of semi-Markov duration modelling has also been addressed for directed case in these CVPR05 and AIJ09 papers.
  • Streaming Clustering with Bayesian Nonparametric Models
    Viet Huynh and Dinh Phung. Neurocomputing, 258:52-62, October 2017. [ | | pdf]
    Bayesian nonparametric (BNP) models are theoretically suitable for learning streaming data due to their complexity relaxation to growing data observed over time. There is a rich body of literature on developing efficient approximate methods for posterior inferences in BNP models, typically dominated by MCMC. However, very limited work has addressed posterior inference in a streaming fashion, which is important to fully realize the potential of BNP models applied to real-world tasks. The main challenge resides in developing one-pass posterior update which is consistent withthe data streamed over time (i.e., data is scanned only once), for which general MCMC methods will fail to address. On the other hand, Dirichlet process-based mixture models are the most fundamental building blocks in the field of BNP. To this end, we develop in this paper a class of variational methods suitable for posterior inference of the Dirichlet process mixture (DPM) models where both the posterior update and data are presented in a streaming setting. We first propose new methods to advance existing variational based inference approaches for BNP to allow the variational distributions growing over time, hence overcoming an important limitation of current methods in imposing parametric, truncated restrictions on the variational distributions. This results in two new methods namely truncation-free variational Bayes (TFVB) and truncation-free maximization expectation (TFME) respectively where the latter further supports hard clustering. These inference methods form the foundation for our streaming inference algorithm where we further adapt the recent Streaming Variational Bayes proposed in [1] to our task. To demonstrate our framework for realworld tasks whose datasets are often heterogeneous, we develop one more theoretical extension for our model to handle assorted data where each observation consists of different data types. Our experiments with automatically learning the number of clusters demonstrate the comparable inference capability of our framework in comparison with truncated version variational inference algorithms for both synthetic and real-world datasets. Moreover, an evaluation of streaming learning algorithms with text corpora reveals both quantitative and qualitative efficacy of the algorithms on clustering documents.
    @ARTICLE { huynh_phung_neuro17streaming,
        AUTHOR = { Viet Huynh and Dinh Phung },
        TITLE = { Streaming Clustering with Bayesian Nonparametric Models },
        JOURNAL = { Neurocomputing },
        YEAR = { 2017 },
        VOLUME = { 258 },
        PAGES = { 52--62 },
        MONTH = { October },
        ISSN = { 0925-2312 },
        ABSTRACT = { Bayesian nonparametric (BNP) models are theoretically suitable for learning streaming data due to their complexity relaxation to growing data observed over time. There is a rich body of literature on developing efficient approximate methods for posterior inferences in BNP models, typically dominated by MCMC. However, very limited work has addressed posterior inference in a streaming fashion, which is important to fully realize the potential of BNP models applied to real-world tasks. The main challenge resides in developing one-pass posterior update which is consistent withthe data streamed over time (i.e., data is scanned only once), for which general MCMC methods will fail to address. On the other hand, Dirichlet process-based mixture models are the most fundamental building blocks in the field of BNP. To this end, we develop in this paper a class of variational methods suitable for posterior inference of the Dirichlet process mixture (DPM) models where both the posterior update and data are presented in a streaming setting. We first propose new methods to advance existing variational based inference approaches for BNP to allow the variational distributions growing over time, hence overcoming an important limitation of current methods in imposing parametric, truncated restrictions on the variational distributions. This results in two new methods namely truncation-free variational Bayes (TFVB) and truncation-free maximization expectation (TFME) respectively where the latter further supports hard clustering. These inference methods form the foundation for our streaming inference algorithm where we further adapt the recent Streaming Variational Bayes proposed in [1] to our task. To demonstrate our framework for realworld tasks whose datasets are often heterogeneous, we develop one more theoretical extension for our model to handle assorted data where each observation consists of different data types. Our experiments with automatically learning the number of clusters demonstrate the comparable inference capability of our framework in comparison with truncated version variational inference algorithms for both synthetic and real-world datasets. Moreover, an evaluation of streaming learning algorithms with text corpora reveals both quantitative and qualitative efficacy of the algorithms on clustering documents. },
        FILE = { :huynh_phung_neuro17streaming - Streaming Clustering with Bayesian Nonparametric Models.pdf:PDF },
        KEYWORDS = { streaming learning, Bayesian nonparametric, variational Bayes inference, Dirichlet process, Dirichlet process mixtures, heterogeneous data sources },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2017.02.18 },
        URL = { http://www.sciencedirect.com/science/article/pii/S0925231217304253 },
    }
J
  • Effective Sparse Imputation of Patient Conditions in Electronic Medical Records for Emergency Risk Predictions
    Budhaditya Saha, Sunil Gupta, Dinh Phung and Svetha Venkatesh. Knowledge and Information Systems (KAIS), 2017. [ | | pdf]
    Electronic Medical Records (EMR) are being increasingly used for “risk” prediction.By “risks”, we denote outcomes such as emergency presentation, readmission, thelength of hospitalizations etc. However, EMR data analysis is complicated by missing entries.There are two reasons - the “primary reason for admission” is included in EMR, but thecomorbidities (other chronic diseases) are left uncoded, and, many zero values in the dataare accurate, reflecting that a patient has not accessed medical facilities. A key challenge isto deal with the peculiarities of this data - unlike many other datasets, EMR is sparse, reflectingthe fact that patients have some, but not all diseases. We propose a novel model to fill-inthese missing values and use the new representation for prediction of key hospital events. To“fill-in” missing values, we represent the feature-patient matrix as a product of two low-rankfactors, preserving the sparsity property in the product. Intuitively, the product regularizationallows sparse imputation of patient conditions reflecting common comorbidities acrosspatients. We develop a scalable optimization algorithm based on Block coordinate descentmethod to find an optimal solution. We evaluate the proposed framework on two real worldEMR cohorts: Cancer (7000 admissions) and Acute Myocardial Infarction (2652 admissions).Our result shows that the AUC for 3 months emergency presentation prediction isimproved significantly from (0.729 to 0.741) for Cancer data and (0.699 to 0.723) for AMIdata. Similarly, AUC for 3 months emergency admission prediction from (0.730 to 0.752)for Cancer data and (0.682 to 0.724) for AMI data. We also extend the proposed method toa supervised model for predicting multiple related risk outcomes (e.g. emergency presentationsand admissions in hospital over 3, 6 and 12 months period) in an integrated framework.The supervised model consistently outperforms state-of-the-art baseline methods.
    @ARTICLE { budhaditya_gupta_phung_venkatesh_kais17effective,
        AUTHOR = { Budhaditya Saha and Sunil Gupta and Dinh Phung and Svetha Venkatesh },
        TITLE = { Effective Sparse Imputation of Patient Conditions in Electronic Medical Records for Emergency Risk Predictions },
        JOURNAL = { Knowledge and Information Systems (KAIS) },
        YEAR = { 2017 },
        ABSTRACT = { Electronic Medical Records (EMR) are being increasingly used for “risk” prediction.By “risks”, we denote outcomes such as emergency presentation, readmission, thelength of hospitalizations etc. However, EMR data analysis is complicated by missing entries.There are two reasons - the “primary reason for admission” is included in EMR, but thecomorbidities (other chronic diseases) are left uncoded, and, many zero values in the dataare accurate, reflecting that a patient has not accessed medical facilities. A key challenge isto deal with the peculiarities of this data - unlike many other datasets, EMR is sparse, reflectingthe fact that patients have some, but not all diseases. We propose a novel model to fill-inthese missing values and use the new representation for prediction of key hospital events. To“fill-in” missing values, we represent the feature-patient matrix as a product of two low-rankfactors, preserving the sparsity property in the product. Intuitively, the product regularizationallows sparse imputation of patient conditions reflecting common comorbidities acrosspatients. We develop a scalable optimization algorithm based on Block coordinate descentmethod to find an optimal solution. We evaluate the proposed framework on two real worldEMR cohorts: Cancer (7000 admissions) and Acute Myocardial Infarction (2652 admissions).Our result shows that the AUC for 3 months emergency presentation prediction isimproved significantly from (0.729 to 0.741) for Cancer data and (0.699 to 0.723) for AMIdata. Similarly, AUC for 3 months emergency admission prediction from (0.730 to 0.752)for Cancer data and (0.682 to 0.724) for AMI data. We also extend the proposed method toa supervised model for predicting multiple related risk outcomes (e.g. emergency presentationsand admissions in hospital over 3, 6 and 12 months period) in an integrated framework.The supervised model consistently outperforms state-of-the-art baseline methods. },
        FILE = { :budhaditya_gupta_phung_venkatesh_kais17effective - Effective Sparse Imputation of Patient Conditions in Electronic Medical Records for Emergency Risk Predictions.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.05.17 },
        URL = { https://link.springer.com/article/10.1007/s10115-017-1038-0 },
    }
J
  • Energy-Based Localized Anomaly Detection in Video Surveillance
    Hung Vu, Tu Dinh Nguyen, Anthony Travers, Svetha Venkatesh and Dinh Phung. In The Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), Jeju, South Korea, May 23-26 2017. (Best Application Paper Award). [ | | pdf]
    Automated detection of abnormal events in video surveillance is an important task in research and practical applications. This is, however, a challenging problem due to the growing collection of data without the knowledge of what to be defined as “abnormal”, and the expensive feature engineering procedure. In this paper we introduce a unified framework for anomaly detection in video based on the restricted Boltzmann machine (RBM), a recent powerful method for unsupervised learning and representation learning. Our proposed system works directly on the image pixels rather than hand-crafted features, it learns new representations for data in a completely unsupervised manner without the need for labels, and then reconstructs the data to recognize the locations of abnormal events based on the reconstruction errors. More importantly, our approach can be deployed in both offline and streaming settings, in which trained parameters of the model are fixed in offline setting whilst are updated incrementally with video data arriving in a stream. Experiments on three publicly benchmark video datasets show that our proposed method can detect and localize the abnormalities at pixel level with better accuracy than those of baselines, and achieve competitive performance compared with state-of-the-art approaches. Moreover, as RBM belongs to a wider class of deep generative models, our framework lays the groundwork towards a more powerful deep unsupervised abnormality detection framework.
    @INPROCEEDINGS { vu_etal_pakdd17energy,
        AUTHOR = { Hung Vu and Tu Dinh Nguyen and Anthony Travers and Svetha Venkatesh and Dinh Phung },
        TITLE = { Energy-Based Localized Anomaly Detection in Video Surveillance },
        BOOKTITLE = { The Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) },
        YEAR = { 2017 },
        EDITOR = { Jinho Kim, Kyuseok Shim, Longbing Cao, Jae-Gil Lee, Xuemin Lin, Yang-Sae Moon },
        ADDRESS = { Jeju, South Korea },
        MONTH = { May 23-26 },
        NOTE = { Best Application Paper Award },
        ABSTRACT = { Automated detection of abnormal events in video surveillance is an important task in research and practical applications. This is, however, a challenging problem due to the growing collection of data without the knowledge of what to be defined as “abnormal”, and the expensive feature engineering procedure. In this paper we introduce a unified framework for anomaly detection in video based on the restricted Boltzmann machine (RBM), a recent powerful method for unsupervised learning and representation learning. Our proposed system works directly on the image pixels rather than hand-crafted features, it learns new representations for data in a completely unsupervised manner without the need for labels, and then reconstructs the data to recognize the locations of abnormal events based on the reconstruction errors. More importantly, our approach can be deployed in both offline and streaming settings, in which trained parameters of the model are fixed in offline setting whilst are updated incrementally with video data arriving in a stream. Experiments on three publicly benchmark video datasets show that our proposed method can detect and localize the abnormalities at pixel level with better accuracy than those of baselines, and achieve competitive performance compared with state-of-the-art approaches. Moreover, as RBM belongs to a wider class of deep generative models, our framework lays the groundwork towards a more powerful deep unsupervised abnormality detection framework. },
        FILE = { :vu_etal_pakdd17energy - Energy Based Localized Anomaly Detection in Video Surveillance.pdf:PDF },
        OWNER = { hungv },
        TIMESTAMP = { 2017.01.31 },
        URL = { https://link.springer.com/chapter/10.1007/978-3-319-57454-7_50 },
    }
C
2016
  • One-Pass Logistic Regression for Label-Drift and Large-Scale Classification on Distributed Systems
    Nguyen, Vu, Nguyen, Tu Dinh, Le, Trung, Phung, Dinh and Venkatesh, Svetha. In 2016 IEEE 16th International Conference on Data Mining (ICDM), pages 1113-1118, Dec 2016. [ | | pdf | code]
    Logistic regression (LR) for classification is the workhorse in industry, where a set of predefined classes is required. The model, however, fails to work in the case where the class labels are not known in advance, a problem we term label-drift classification. Label-drift classification problem naturally occurs in many applications, especially in the context of streaming settings where the incoming data may contain samples categorized with new classes that have not been previously seen. Additionally, in the wave of big data, traditional LR methods may fail due to their expense of running time. In this paper, we introduce a novel variant of LR, namely one-pass logistic regression (OLR) to offer a principled treatment for label-drift and large-scale classifications. To handle largescale classification for big data, we further extend our OLR to a distributed setting for parallelization, termed sparkling OLR (Spark-OLR). We demonstrate the scalability of our proposed methods on large-scale datasets with more than one hundred million data points. The experimental results show that the predictive performances of our methods are comparable orbetter than those of state-of-the-art baselines whilst the executiontime is much faster at an order of magnitude. In addition, the OLR and Spark-OLR are invariant to data shuffling and have no hyperparameter to tune that significantly benefits data practitioners and overcomes the curse of big data cross-validationto select optimal hyperparameters.
    @CONFERENCE { nguyen_etal_icdm16onepass,
        AUTHOR = { Nguyen, Vu and Nguyen, Tu Dinh and Le, Trung and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { One-Pass Logistic Regression for Label-Drift and Large-Scale Classification on Distributed Systems },
        BOOKTITLE = { 2016 IEEE 16th International Conference on Data Mining (ICDM) },
        YEAR = { 2016 },
        PAGES = { 1113-1118 },
        MONTH = { Dec },
        ABSTRACT = { Logistic regression (LR) for classification is the workhorse in industry, where a set of predefined classes is required. The model, however, fails to work in the case where the class labels are not known in advance, a problem we term label-drift classification. Label-drift classification problem naturally occurs in many applications, especially in the context of streaming settings where the incoming data may contain samples categorized with new classes that have not been previously seen. Additionally, in the wave of big data, traditional LR methods may fail due to their expense of running time. In this paper, we introduce a novel variant of LR, namely one-pass logistic regression (OLR) to offer a principled treatment for label-drift and large-scale classifications. To handle largescale classification for big data, we further extend our OLR to a distributed setting for parallelization, termed sparkling OLR (Spark-OLR). We demonstrate the scalability of our proposed methods on large-scale datasets with more than one hundred million data points. The experimental results show that the predictive performances of our methods are comparable orbetter than those of state-of-the-art baselines whilst the executiontime is much faster at an order of magnitude. In addition, the OLR and Spark-OLR are invariant to data shuffling and have no hyperparameter to tune that significantly benefits data practitioners and overcomes the curse of big data cross-validationto select optimal hyperparameters. },
        CODE = { https://github.com/ntienvu/ICDM2016_OLR },
        DOI = { 10.1109/ICDM.2016.0145 },
        FILE = { :nguyen_etal_icdm16onepass - One Pass Logistic Regression for Label Drift and Large Scale Classification on Distributed Systems.pdf:PDF },
        KEYWORDS = { Big Data;distributed processing;pattern classification;regression analysis;Big Data cross-validation;Spark-OLR;class labels;data shuffling;distributed systems;execution time;label-drift classification problem;large-scale classification;large-scale datasets;one-pass logistic regression;optimal hyperparameter selection;sparkling OLR;Bayes methods;Big data;Context;Data models;Estimation;Industries;Logistics;Apache Spark;Logistic regression;distributed system;label-drift;large-scale classification },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.09.10 },
        URL = { http://ieeexplore.ieee.org/document/7837958/ },
    }
C
  • Dual Space Gradient Descent for Online Learning
    Le, Trung, Nguyen, Tu Dinh, Nguyen, Vu and Phung, Dinh. In Advances in Neural Information Processing (NIPS), December 2016. [ | | pdf]
    One crucial goal in kernel online learning is to bound the model size. Common approaches employ budget maintenance procedures to restrict the model sizes using removal, projection, or merging strategies. Although projection and merging, in the literature, are known to be the most effective strategies, they demand extensive computation whilst removal strategy fails to retain information of the removed vectors. An alternative way to address the model size problem is to apply random features to approximate the kernel function. This allows the model to be maintained directly in the random feature space, hence effectively resolve the curse of kernelization. However, this approach still suffers from a serious shortcoming as it needs to use a high dimensional random feature space to achieve a sufficiently accurate kernel approximation. Consequently, it leads to a significant increase in the computational cost. To address all of these aforementioned challenges, we present in this paper the Dual Space Gradient Descent (DualSGD), a novel framework that utilizes random features as an auxiliary space to maintain information from data points removed during budget maintenance. Consequently, our approach permits the budget to be maintained in a simple, direct and elegant way while simultaneously mitigating the impact of the dimensionality issue on learning performance. We further provide convergence analysis and extensively conduct experiments on five real-world datasets to demonstrate the predictive performance and scalability of our proposed method in comparison with the state-of-the-art baselines.
    @CONFERENCE { le_etal_nips16dual,
        AUTHOR = { Le, Trung and Nguyen, Tu Dinh and Nguyen, Vu and Phung, Dinh },
        TITLE = { Dual Space Gradient Descent for Online Learning },
        BOOKTITLE = { Advances in Neural Information Processing (NIPS) },
        YEAR = { 2016 },
        MONTH = { December },
        ABSTRACT = { One crucial goal in kernel online learning is to bound the model size. Common approaches employ budget maintenance procedures to restrict the model sizes using removal, projection, or merging strategies. Although projection and merging, in the literature, are known to be the most effective strategies, they demand extensive computation whilst removal strategy fails to retain information of the removed vectors. An alternative way to address the model size problem is to apply random features to approximate the kernel function. This allows the model to be maintained directly in the random feature space, hence effectively resolve the curse of kernelization. However, this approach still suffers from a serious shortcoming as it needs to use a high dimensional random feature space to achieve a sufficiently accurate kernel approximation. Consequently, it leads to a significant increase in the computational cost. To address all of these aforementioned challenges, we present in this paper the Dual Space Gradient Descent (DualSGD), a novel framework that utilizes random features as an auxiliary space to maintain information from data points removed during budget maintenance. Consequently, our approach permits the budget to be maintained in a simple, direct and elegant way while simultaneously mitigating the impact of the dimensionality issue on learning performance. We further provide convergence analysis and extensively conduct experiments on five real-world datasets to demonstrate the predictive performance and scalability of our proposed method in comparison with the state-of-the-art baselines. },
        FILE = { :le_etal_nips16dual - Dual Space Gradient Descent for Online Learning.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.08.16 },
        URL = { https://papers.nips.cc/paper/6560-dual-space-gradient-descent-for-online-learning.pdf },
    }
C
  • Scalable Nonparametric Bayesian Multilevel Clustering
    Viet Huynh, Dinh Phung, Svetha Venkatesh, Xuan-Long Nguyen, Matt Hoffman and Hung Bui. In Proceedings of the 32nd Conference on Uncertainty in Artificial Intelligence (UAI), pages 289-298, June 2016. [ | | pdf]
    @CONFERENCE { huynh_phung_venkatesh_nguyen_hoffman_bui_uai16scalable,
        AUTHOR = { Viet Huynh and Dinh Phung and Svetha Venkatesh and Xuan-Long Nguyen and Matt Hoffman and Hung Bui },
        TITLE = { Scalable Nonparametric {B}ayesian Multilevel Clustering },
        BOOKTITLE = { Proceedings of the 32nd Conference on Uncertainty in Artificial Intelligence (UAI) },
        YEAR = { 2016 },
        MONTH = { June },
        PUBLISHER = { AUAI Pres },
        PAGES = { 289--298 },
        FILE = { :huynh_phung_venkatesh_nguyen_hoffman_bui_uai16scalable - Scalable Nonparametric Bayesian Multilevel Clustering.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.05.09 },
        URL = { http://auai.org/uai2016/proceedings/papers/262.pdf },
    }
C
  • Budgeted Semi-supervised Support Vector Machine
    Le, Trung, Duong, Phuong, Dinh, Mi, Nguyen, Tu, Nguyen, Vu and Phung, Dinh. In 32nd Conference on Uncertainty in Artificial Intelligence (UAI), June 2016. [ | | pdf]
    @CONFERENCE { le_duong_dinh_nguyen_nguyen_phung_uai16budgeted,
        AUTHOR = { Le, Trung and Duong, Phuong and Dinh, Mi and Nguyen, Tu and Nguyen, Vu and Phung, Dinh },
        TITLE = { Budgeted Semi-supervised {S}upport {V}ector {M}achine },
        BOOKTITLE = { 32nd Conference on Uncertainty in Artificial Intelligence (UAI) },
        YEAR = { 2016 },
        MONTH = { June },
        FILE = { :le_duong_dinh_nguyen_nguyen_phung_uai16budgeted - Budgeted Semi Supervised Support Vector Machine.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.05.09 },
        URL = { http://auai.org/uai2016/proceedings/papers/110.pdf },
    }
C
  • Nonparametric Budgeted Stochastic Gradient Descent
    Le, Trung, Nguyen, Vu, Nguyen, Tu Dinh and Phung, Dinh. In 19th Intl. Conf. on Artificial Intelligence and Statistics (AISTATS), May 2016. [ | | pdf]
    @CONFERENCE { le_nguyen_phung_aistats16nonparametric,
        AUTHOR = { Le, Trung and Nguyen, Vu and Nguyen, Tu Dinh and Phung, Dinh },
        TITLE = { Nonparametric Budgeted Stochastic Gradient Descent },
        BOOKTITLE = { 19th Intl. Conf. on Artificial Intelligence and Statistics (AISTATS) },
        YEAR = { 2016 },
        MONTH = { May },
        FILE = { :le_nguyen_phung_aistats16nonparametric - Nonparametric Budgeted Stochastic Gradient Descent.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.04.06 },
        URL = { http://www.jmlr.org/proceedings/papers/v51/le16.pdf },
    }
C
  • Introduction: special issue of selected papers from ACML 2014
    Dinh Phung and Hang Li and Tru Cao and Tu-Bao Ho and Zhi-Hua Zhou, editor. volume 103, Springer, May 2016. [ | | pdf]
    @PROCEEDINGS { li_phung_cao_ho_zhou_acml14_selectedpapers,
        TITLE = { Introduction: special issue of selected papers from {ACML} 2014 },
        YEAR = { 2016 },
        EDITOR = { Dinh Phung and Hang Li and Tru Cao and Tu-Bao Ho and Zhi-Hua Zhou },
        VOLUME = { 103 },
        NUMBER = { 2 },
        PUBLISHER = { Springer },
        MONTH = { May },
        FILE = { :li_phung_cao_ho_zhou_acml14_selectedpapers - Introduction_ Special Issue of Selected Papers from ACML 2014.pdf:PDF },
        ISSN = { 1573-0565 },
        JOURNAL = { Machine Learning },
        OWNER = { Thanh-Binh Nguyen },
        PAGES = { 137--139 },
        TIMESTAMP = { 2016.04.11 },
        URL = { http://dx.doi.org/10.1007/s10994-016-5549-9 },
    }
P
  • Guidelines for Developing and Reporting Machine Learning Predictive Models in Biomedical Research: A Multidisciplinary View
    Luo, Wei, Phung, Dinh, Tran, Truyen, Gupta, Sunil, Rana, Santu, Karmakar, Chandan, Shilton, Alistair, Yearwood, John, Dimitrova, Nevenka, Ho, Bao Tu, Venkatesh, Svetha and Berk, Michael. J Med Internet Res, 18(12):e323, Dec 2016. [ | | pdf]
    Background: As more and more researchers are turning to big data for new opportunities of biomedical discoveries, machine learning models, as the backbone of big data analysis, are mentioned more often in biomedical journals. However, owing to the inherent complexity of machine learning methods, they are prone to misuse. Because of the flexibility in specifying machine learning models, the results are often insufficiently reported in research articles, hindering reliable assessment of model validity and consistent interpretation of model outputs. Objective: To attain a set of guidelines on the use of machine learning predictive models within clinical settings to make sure the models are correctly applied and sufficiently reported so that true discoveries can be distinguished from random coincidence. Methods: A multidisciplinary panel of machine learning experts, clinicians, and traditional statisticians were interviewed, using an iterative process in accordance with the Delphi method. Results: The process produced a set of guidelines that consists of (1) a list of reporting items to be included in a research article and (2) a set of practical sequential steps for developing predictive models. Conclusions: A set of guidelines was generated to enable correct application of machine learning models and consistent reporting of model specifications and results in biomedical research. We believe that such guidelines will accelerate the adoption of big data analysis, particularly with machine learning methods, in the biomedical research community.
    @ARTICLE { Luo_etal_jmir16guidelines,
        AUTHOR = { Luo, Wei and Phung, Dinh and Tran, Truyen and Gupta, Sunil and Rana, Santu and Karmakar, Chandan and Shilton, Alistair and Yearwood, John and Dimitrova, Nevenka and Ho, Bao Tu and Venkatesh, Svetha and Berk, Michael },
        TITLE = { Guidelines for Developing and Reporting Machine Learning Predictive Models in Biomedical Research: A Multidisciplinary View },
        JOURNAL = { J Med Internet Res },
        YEAR = { 2016 },
        VOLUME = { 18 },
        NUMBER = { 12 },
        PAGES = { e323 },
        MONTH = { Dec },
        ABSTRACT = { Background: As more and more researchers are turning to big data for new opportunities of biomedical discoveries, machine learning models, as the backbone of big data analysis, are mentioned more often in biomedical journals. However, owing to the inherent complexity of machine learning methods, they are prone to misuse. Because of the flexibility in specifying machine learning models, the results are often insufficiently reported in research articles, hindering reliable assessment of model validity and consistent interpretation of model outputs. Objective: To attain a set of guidelines on the use of machine learning predictive models within clinical settings to make sure the models are correctly applied and sufficiently reported so that true discoveries can be distinguished from random coincidence. Methods: A multidisciplinary panel of machine learning experts, clinicians, and traditional statisticians were interviewed, using an iterative process in accordance with the Delphi method. Results: The process produced a set of guidelines that consists of (1) a list of reporting items to be included in a research article and (2) a set of practical sequential steps for developing predictive models. Conclusions: A set of guidelines was generated to enable correct application of machine learning models and consistent reporting of model specifications and results in biomedical research. We believe that such guidelines will accelerate the adoption of big data analysis, particularly with machine learning methods, in the biomedical research community. },
        DAY = { 16 },
        DOI = { 10.2196/jmir.5870 },
        FILE = { :Luo_etal_jmir16guidelines - Guidelines for Developing and Reporting Machine Learning Predictive Models in Biomedical Research_ a Multidisciplinary View.pdf:PDF },
        KEYWORDS = { machine learning, clinical prediction rule, guideline },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.12.21 },
        URL = { http://www.jmir.org/2016/12/e323/ },
    }
J
  • Data Clustering Using Side Information Dependent Chinese Restaurant Processes
    Li, Cheng, Rana, Santu, Phung, Dinh and Venkatesh, Svetha. Knowledge and Information Systems (KAIS), 47(2):463-488, May 2016. [ | | pdf]
    Side information, or auxiliary information associated with documents or image content, provides hints for clustering. We propose a new model, side information dependent Chinese restaurant process, which exploits side information in a Bayesian nonparametric model to improve data clustering. We introduce side information into the framework of distance dependent Chinese restaurant process using a robust decay function to handle noisy side information. The threshold parameter of the decay function is updated automatically in the Gibbs sampling process. A fast inference algorithm is proposed. We evaluate our approach on four datasets: Cora, 20 Newsgroups, NUS-WIDE and one medical dataset. Types of side information explored in this paper include citations, authors, tags, keywords and auxiliary clinical information. The comparison with the state-of-the-art approaches based on standard performance measures (NMI, F1) clearly shows the superiority of our approach.
    @ARTICLE { li_rana_phung_venkatesh_kais16,
        AUTHOR = { Li, Cheng and Rana, Santu and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { Data Clustering Using Side Information Dependent {C}hinese Restaurant Processes },
        JOURNAL = { Knowledge and Information Systems (KAIS) },
        YEAR = { 2016 },
        VOLUME = { 47 },
        NUMBER = { 2 },
        PAGES = { 463--488 },
        MONTH = { May },
        ABSTRACT = { Side information, or auxiliary information associated with documents or image content, provides hints for clustering. We propose a new model, side information dependent Chinese restaurant process, which exploits side information in a Bayesian nonparametric model to improve data clustering. We introduce side information into the framework of distance dependent Chinese restaurant process using a robust decay function to handle noisy side information. The threshold parameter of the decay function is updated automatically in the Gibbs sampling process. A fast inference algorithm is proposed. We evaluate our approach on four datasets: Cora, 20 Newsgroups, NUS-WIDE and one medical dataset. Types of side information explored in this paper include citations, authors, tags, keywords and auxiliary clinical information. The comparison with the state-of-the-art approaches based on standard performance measures (NMI, F1) clearly shows the superiority of our approach. },
        DOI = { 10.1007/s10115-015-0834-7 },
        FILE = { :li_rana_phung_venkatesh_kais16 - Data Clustering Using Side Information Dependent Chinese Restaurant Processes.pdf:PDF },
        KEYWORDS = { Side information Similarity Data clustering Bayesian nonparametric models },
        OWNER = { Dinh },
        TIMESTAMP = { 2015.03.02 },
        URL = { http://link.springer.com/article/10.1007/s10115-015-0834-7 },
    }
J
  • Multiple Kernel Learning with Data Augmentation
    Nguyen, Khanh, Le, Trung, Nguyen, Vu, Nguyen, Tu Dinh and Phung, Dinh. In 8th Asian Conference on Machine Learning (ACML), Nov. 2016. [ | ]
    @CONFERENCE { nguyen_etal_acml16multiple,
        AUTHOR = { Nguyen, Khanh and Le, Trung and Nguyen, Vu and Nguyen, Tu Dinh and Phung, Dinh },
        TITLE = { Multiple Kernel Learning with Data Augmentation },
        BOOKTITLE = { 8th Asian Conference on Machine Learning (ACML) },
        YEAR = { 2016 },
        MONTH = { Nov. },
        FILE = { :nguyen_etal_acml16multiple - Multiple Kernel Learning with Data Augmentation.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.07.13 },
    }
C
  • Exceptional Contrast Set Mining: Moving Beyond the Deluge of the Obvious
    Nguyen, Dang, Luo, Wei, Phung, Dinh and Venkatesh, Svetha. In Advances in Artificial Intelligence, pages 455-468.Springer, , 2016. (Student travel award). [ | | pdf]
    Data scientists, with access to fast growing data and computing power, constantly look for algorithms with greater detection power to discover “novel” knowledge. But more often than not, their algorithms give them too many outputs that are either highly speculative or simply confirming what the domain experts already know. To escape this dilemma, we need algorithms that move beyond the obvious association analyses and leverage domain analytic objectives (aka. KPIs) to look for higher order connections. We propose a new technique Exceptional Contrast Set Mining that first gathers a succinct collection of affirmative contrast sets based on the principle of redundant information elimination. Then it discovers exceptional contrast sets that contradict the affirmative contrast sets. The algorithm has been successfully applied to several analytic consulting projects. In particular, during an analysis of a state-wide cancer registry, it discovered a surprising regional difference in breast cancer screening.
    @INCOLLECTION { nguyen_etal_ai16exceptional,
        AUTHOR = { Nguyen, Dang and Luo, Wei and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { Exceptional Contrast Set Mining: Moving Beyond the Deluge of the Obvious },
        BOOKTITLE = { Advances in Artificial Intelligence },
        PUBLISHER = { Springer },
        YEAR = { 2016 },
        VOLUME = { 9992 },
        PAGES = { 455--468 },
        NOTE = { Student travel award },
        ABSTRACT = { Data scientists, with access to fast growing data and computing power, constantly look for algorithms with greater detection power to discover “novel” knowledge. But more often than not, their algorithms give them too many outputs that are either highly speculative or simply confirming what the domain experts already know. To escape this dilemma, we need algorithms that move beyond the obvious association analyses and leverage domain analytic objectives (aka. KPIs) to look for higher order connections. We propose a new technique Exceptional Contrast Set Mining that first gathers a succinct collection of affirmative contrast sets based on the principle of redundant information elimination. Then it discovers exceptional contrast sets that contradict the affirmative contrast sets. The algorithm has been successfully applied to several analytic consulting projects. In particular, during an analysis of a state-wide cancer registry, it discovered a surprising regional difference in breast cancer screening. },
        FILE = { :nguyen_etal_ai16exceptional - Exceptional Contrast Set Mining_ Moving beyond the Deluge of the Obvious.pdf:PDF },
        GROUPS = { Contrast Set Mining },
        ORGANIZATION = { Springer },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2017.01.05 },
        URL = { http://link.springer.com/chapter/10.1007/978-3-319-50127-7_39 },
    }
BC
  • SECC: Simultaneous extraction of context and community from pervasive signals
    Nguyen, T., Nguyen, V., Salim, F.D. and Phung, D.. In IEEE Intl. Conf. on Pervasive Computing and Communications (PERCOM), pages 1-9, March 2016. [ | | pdf]
    Understanding user contexts and group structures plays a central role in pervasive computing. These contexts and community structures are complex to mine from data collected in the wild due to the unprecedented growth of data, noise, uncertainties and complexities. Typical existing approaches would first extract the latent patterns to explain the human dynamics or behaviors and then use them as the way to consistently formulate numerical representations for community detection, often via a clustering method. While being able to capture highorder and complex representations, these two steps are performed separately. More importantly, they face a fundamental difficulty in determining the correct number of latent patterns and communities. This paper presents an approach that seamlessly addresses these challenges to simultaneously discover latent patterns and communities in a unified Bayesian nonparametric framework. Our Simultaneous Extraction of Context and Community (SECC) model roots in the nested Dirichlet process theory which allows nested structure to be built to explain data at multiple levels. We demonstrate our framework on three public datasets where the advantages of the proposed approach are validated.
    @INPROCEEDINGS { nguyen_nguyen_salim_phung_percom16secc,
        AUTHOR = { Nguyen, T. and Nguyen, V. and Salim, F.D. and Phung, D. },
        TITLE = { {SECC}: Simultaneous extraction of context and community from pervasive signals },
        BOOKTITLE = { IEEE Intl. Conf. on Pervasive Computing and Communications (PERCOM) },
        YEAR = { 2016 },
        PAGES = { 1-9 },
        MONTH = { March },
        ABSTRACT = { Understanding user contexts and group structures plays a central role in pervasive computing. These contexts and community structures are complex to mine from data collected in the wild due to the unprecedented growth of data, noise, uncertainties and complexities. Typical existing approaches would first extract the latent patterns to explain the human dynamics or behaviors and then use them as the way to consistently formulate numerical representations for community detection, often via a clustering method. While being able to capture highorder and complex representations, these two steps are performed separately. More importantly, they face a fundamental difficulty in determining the correct number of latent patterns and communities. This paper presents an approach that seamlessly addresses these challenges to simultaneously discover latent patterns and communities in a unified Bayesian nonparametric framework. Our Simultaneous Extraction of Context and Community (SECC) model roots in the nested Dirichlet process theory which allows nested structure to be built to explain data at multiple levels. We demonstrate our framework on three public datasets where the advantages of the proposed approach are validated. },
        DOI = { 10.1109/PERCOM.2016.7456501 },
        FILE = { :nguyen_nguyen_salim_phung_percom16secc - SECC_ Simultaneous Extraction of Context and Community from Pervasive Signals.pdf:PDF },
        KEYWORDS = { Bluetooth;Context;Context modeling;Data mining;Data models;Feature extraction;Mixture models },
        URL = { http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7456501 },
    }
C
  • Nonparametric discovery of movement patterns from accelerometer signals
    Nguyen, T., Gupta, S., Venkatesh, S. and Phung, D.. Pattern Recognition Letters, 70(C):52-58, Jan. 2016. [ | | pdf]
    Monitoring daily physical activity plays an important role in disease prevention and intervention. This paper proposes an approach to monitor the body movement intensity levels from accelerometer data. We collect the data using the accelerometer in a realistic setting without any supervision. The ground-truth of activities is provided by the participants themselves using an experience sampling application running on their mobile phones. We compute a novel feature that has a strong correlation with the movement intensity. We use the hierarchical Dirichlet process (HDP) model to detect the activity levels from this feature. Consisting of Bayesian nonparametric priors over the parameters the model can infer the number of levels automatically. By demonstrating the approach on the publicly available USC-HAD dataset that includes ground-truth activity labels, we show a strong correlation between the discovered activity levels and the movement intensity of the activities. This correlation is further confirmed using our newly collected dataset. We further use the extracted patterns as features for clustering and classifying the activity sequences to improve performance.
    @ARTICLE { nguyen_gupta_venkatesh_phung_pr16nonparametric,
        AUTHOR = { Nguyen, T. and Gupta, S. and Venkatesh, S. and Phung, D. },
        TITLE = { Nonparametric discovery of movement patterns from accelerometer signals },
        JOURNAL = { Pattern Recognition Letters },
        YEAR = { 2016 },
        VOLUME = { 70 },
        NUMBER = { C },
        PAGES = { 52--58 },
        MONTH = { Jan. },
        ISSN = { 0167-8655 },
        ABSTRACT = { Monitoring daily physical activity plays an important role in disease prevention and intervention. This paper proposes an approach to monitor the body movement intensity levels from accelerometer data. We collect the data using the accelerometer in a realistic setting without any supervision. The ground-truth of activities is provided by the participants themselves using an experience sampling application running on their mobile phones. We compute a novel feature that has a strong correlation with the movement intensity. We use the hierarchical Dirichlet process (HDP) model to detect the activity levels from this feature. Consisting of Bayesian nonparametric priors over the parameters the model can infer the number of levels automatically. By demonstrating the approach on the publicly available USC-HAD dataset that includes ground-truth activity labels, we show a strong correlation between the discovered activity levels and the movement intensity of the activities. This correlation is further confirmed using our newly collected dataset. We further use the extracted patterns as features for clustering and classifying the activity sequences to improve performance. },
        DOI = { http://dx.doi.org/10.1016/j.patrec.2015.11.003 },
        FILE = { :nguyen_gupta_venkatesh_phung_pr16nonparametric - Nonparametric Discovery of Movement Patterns from Accelerometer Signals.pdf:PDF },
        KEYWORDS = { Accelerometer, Activity recognition, Bayesian nonparametric, Dirichlet process, Movement intensity },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.05.10 },
        URL = { http://www.sciencedirect.com/science/article/pii/S016786551500389X },
    }
J
  • Preterm Birth Prediction: Stable Selection of Interpretable Rules from High Dimensional Data
    Tran, Truyen, Luo, Wei, Phung, Dinh, Morris, Jonathan, Rickard, Kristen and Venkatesh, Svetha. In Proceedings of the 1st Machine Learning for Healthcare Conference, pages 164-177, 2016. [ | | pdf]
    Preterm births occur at an alarming rate of 10-15%. Preemies have a higher risk of infant mortality, developmental retardation and long-term disabilities. Predicting preterm birth is difficult, even for the most experienced clinicians. The most well-designed clinical study thus far reaches a modest sensitivity of 18.2–24.2% at specificity of 28.6–33.3%. We take a different approach by exploiting databases of normal hospital operations. We aims are twofold: (i) to derive an easy-to-use, interpretable prediction rule with quantified uncertainties, and (ii) to construct accurate classifiers for preterm birth prediction. Our approach is to automatically generate and select from hundreds (if not thousands) of possible predictors using stability-aware techniques. Derived from a large database of 15,814 women, our simplified prediction rule with only 10 items has sensitivity of 62.3% at specificity of 81.5%.
    @INPROCEEDINGS { tran_etal_mlhc16pretern,
        AUTHOR = { Tran, Truyen and Luo, Wei and Phung, Dinh and Morris, Jonathan and Rickard, Kristen and Venkatesh, Svetha },
        TITLE = { Preterm Birth Prediction: Stable Selection of Interpretable Rules from High Dimensional Data },
        BOOKTITLE = { Proceedings of the 1st Machine Learning for Healthcare Conference },
        YEAR = { 2016 },
        EDITOR = { Finale Doshi-Velez, Jim Fackler, David Kale, Byron Wallace, Jenna Weins },
        VOLUME = { 56 },
        SERIES = { JMLR Workshop and Conference Proceedings },
        PAGES = { 164--177 },
        PUBLISHER = { JMLR },
        ABSTRACT = { Preterm births occur at an alarming rate of 10-15%. Preemies have a higher risk of infant mortality, developmental retardation and long-term disabilities. Predicting preterm birth is difficult, even for the most experienced clinicians. The most well-designed clinical study thus far reaches a modest sensitivity of 18.2–24.2% at specificity of 28.6–33.3%. We take a different approach by exploiting databases of normal hospital operations. We aims are twofold: (i) to derive an easy-to-use, interpretable prediction rule with quantified uncertainties, and (ii) to construct accurate classifiers for preterm birth prediction. Our approach is to automatically generate and select from hundreds (if not thousands) of possible predictors using stability-aware techniques. Derived from a large database of 15,814 women, our simplified prediction rule with only 10 items has sensitivity of 62.3% at specificity of 81.5%. },
        FILE = { :tran_etal_mlhc16pretern - Preterm Birth Prediction_ Stable Selection of Interpretable Rules from High Dimensional Data.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.11.02 },
        URL = { http://jmlr.org/proceedings/papers/v56/Tran16.html },
    }
C
  • Computer Assisted Autism Interventions for India
    Vellanki, Pratibha, Greenhill, Stewart, Duong, Thi, Phung, Dinh, Venkatesh, Svetha, Godwin, Jayashree, Achary, Kishna V. and Varkey, Blessin. In Proceedings of the 28th Australian Conference on Computer-Human Interaction, pages 618-622, New York, NY, USA, 2016. [ | | pdf]
    @INPROCEEDINGS { vellanki_etal_ozchi16computer,
        AUTHOR = { Vellanki, Pratibha and Greenhill, Stewart and Duong, Thi and Phung, Dinh and Venkatesh, Svetha and Godwin, Jayashree and Achary, Kishna V. and Varkey, Blessin },
        TITLE = { Computer Assisted Autism Interventions for {I}ndia },
        BOOKTITLE = { Proceedings of the 28th Australian Conference on Computer-Human Interaction },
        YEAR = { 2016 },
        SERIES = { OzCHI '16 },
        PAGES = { 618--622 },
        ADDRESS = { New York, NY, USA },
        PUBLISHER = { ACM },
        ACMID = { 3011007 },
        DOI = { 10.1145/3010915.3011007 },
        FILE = { :vellanki_etal_ozchi16computer - Computer Assisted Autism Interventions for India.pdf:PDF },
        ISBN = { 978-1-4503-4618-4 },
        KEYWORDS = { Hindi, India, assistive technology, autism, early intervention, translation },
        LOCATION = { Launceston, Tasmania, Australia },
        NUMPAGES = { 5 },
        URL = { http://doi.acm.org/10.1145/3010915.3011007 },
    }
C
  • A Simultaneous Extraction of Context and Community from Pervasive Signals Using Nested Dirichlet Process
    Nguyen, T., Nguyen, V., Salim, F.D., Le, D.V. and Phung, D.. Pervasive and Mobile Computing (PMC), 2016. [ | | pdf]
    Understanding user contexts and group structures plays a central role in pervasive computing. These contexts and community structures are complex to mine from data collected in the wild due to the unprecedented growth of data, noise, uncertainties and complexities. Typical existing approaches would first extract the latent patterns to explain the human dynamics or behaviors and then use them as a way to consistently formulate numerical representations for community detection, often via a clustering method. While being able to capture high-order and complex representations, these two steps are performed separately. More importantly, they face a fundamental difficulty in determining the correct number of latent patterns and communities. This paper presents an approach that seamlessly addresses these challenges to simultaneously discover latent patterns and communities in a unified Bayesian nonparametric framework. Our Simultaneous Extraction of Context and Community (SECC) model roots in the nested Dirichlet process theory which allows nested structure to be built to summarize data at multiple levels. We demonstrate our framework on five datasets where the advantages of the proposed approach are validated.
    @ARTICLE { nguyen_nguyen_flora_le_phung_pmc16simultaneous,
        AUTHOR = { Nguyen, T. and Nguyen, V. and Salim, F.D. and Le, D.V. and Phung, D. },
        TITLE = { A Simultaneous Extraction of Context and Community from Pervasive Signals Using Nested {D}irichlet Process },
        JOURNAL = { Pervasive and Mobile Computing (PMC) },
        YEAR = { 2016 },
        ABSTRACT = { Understanding user contexts and group structures plays a central role in pervasive computing. These contexts and community structures are complex to mine from data collected in the wild due to the unprecedented growth of data, noise, uncertainties and complexities. Typical existing approaches would first extract the latent patterns to explain the human dynamics or behaviors and then use them as a way to consistently formulate numerical representations for community detection, often via a clustering method. While being able to capture high-order and complex representations, these two steps are performed separately. More importantly, they face a fundamental difficulty in determining the correct number of latent patterns and communities. This paper presents an approach that seamlessly addresses these challenges to simultaneously discover latent patterns and communities in a unified Bayesian nonparametric framework. Our Simultaneous Extraction of Context and Community (SECC) model roots in the nested Dirichlet process theory which allows nested structure to be built to summarize data at multiple levels. We demonstrate our framework on five datasets where the advantages of the proposed approach are validated. },
        DOI = { http://dx.doi.org/10.1016/j.pmcj.2016.08.019 },
        FILE = { :nguyen_nguyen_flora_le_phung_pmc16simultaneous - A Simultaneous Extraction of Context and Community from Pervasive Signals Using Nested Dirichlet Process.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.08.17 },
        URL = { http://www.sciencedirect.com/science/article/pii/S1574119216302097 },
    }
J
  • Nonparametric discovery and analysis of learning patterns and autism subgroups from therapeutic data
    Vellanki, Pratibha, Duong, Thi, Gupta, Sunil, Venkatesh, Svetha and Phung, Dinh. Knowledge and Information Systems (KAIS), 2016. [ | | pdf]
    The spectrum nature and heterogeneity within autism spectrum disorders (ASD) pose as a challenge for treatment. Personalisation of syllabus for children with ASD can improve the efficacy of learning by adjusting the number of opportunities and deciding the course of syllabus. We research the data-motivated approach in an attempt to disentangle this heterogeneity for personalisation of syllabus. With the help of technology and a structured syllabus, collecting data while a child with ASD masters the skills is made possible. The performance data collected are, however, growing and contain missing elements based on the pace and the course each child takes while navigating through the syllabus. Bayesian nonparametric methods are known for automatically discovering the number of latent components and their parameters when the model involves higher complexity. We propose a nonparametric Bayesian matrix factorisation model that discovers learning patterns and the way participants associate with them. Our model is built upon the linear Poisson gamma model (LPGM) with an Indian buffet process prior and extended to incorporate data with missing elements. In this paper, for the first time we have presented learning patterns deduced automatically from data mining and machine learning methods using intervention data recorded for over 500 children with ASD. We compare the results with non-negative matrix factorisation and K-means, which being parametric, not only require us to specify the number of learning patterns in advance, but also do not have a principle approach to deal with missing data. The F1 score observed over varying degree of similarity measure (Jaccard Index) suggests that LPGM yields the best outcome. By observing these patterns with additional knowledge regarding the syllabus it may be possible to observe the progress and dynamically modify the syllabus for improved learning.
    @ARTICLE { vellanki_etal_kis16nonparametric,
        AUTHOR = { Vellanki, Pratibha and Duong, Thi and Gupta, Sunil and Venkatesh, Svetha and Phung, Dinh },
        TITLE = { Nonparametric discovery and analysis of learning patterns and autism subgroups from therapeutic data },
        JOURNAL = { Knowledge and Information Systems (KAIS) },
        YEAR = { 2016 },
        PAGES = { 1--31 },
        ISSN = { 0219-3116 },
        ABSTRACT = { The spectrum nature and heterogeneity within autism spectrum disorders (ASD) pose as a challenge for treatment. Personalisation of syllabus for children with ASD can improve the efficacy of learning by adjusting the number of opportunities and deciding the course of syllabus. We research the data-motivated approach in an attempt to disentangle this heterogeneity for personalisation of syllabus. With the help of technology and a structured syllabus, collecting data while a child with ASD masters the skills is made possible. The performance data collected are, however, growing and contain missing elements based on the pace and the course each child takes while navigating through the syllabus. Bayesian nonparametric methods are known for automatically discovering the number of latent components and their parameters when the model involves higher complexity. We propose a nonparametric Bayesian matrix factorisation model that discovers learning patterns and the way participants associate with them. Our model is built upon the linear Poisson gamma model (LPGM) with an Indian buffet process prior and extended to incorporate data with missing elements. In this paper, for the first time we have presented learning patterns deduced automatically from data mining and machine learning methods using intervention data recorded for over 500 children with ASD. We compare the results with non-negative matrix factorisation and K-means, which being parametric, not only require us to specify the number of learning patterns in advance, but also do not have a principle approach to deal with missing data. The F1 score observed over varying degree of similarity measure (Jaccard Index) suggests that LPGM yields the best outcome. By observing these patterns with additional knowledge regarding the syllabus it may be possible to observe the progress and dynamically modify the syllabus for improved learning. },
        DOI = { 10.1007/s10115-016-0971-7 },
        FILE = { :vellanki_etal_kis16nonparametric - Nonparametric Discovery and Analysis of Learning Patterns and Autism Subgroups from Therapeutic Data.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.08.02 },
        URL = { http://dx.doi.org/10.1007/s10115-016-0971-7 },
    }
J
  • Forecasting Daily Patient Outflow From a Ward Having No Real-Time Clinical Data
    Gopakumar, Shivapratap, Tran, Truyen, Luo, Wei, Phung, Dinh and Venkatesh, Svetha. JMIR Med Inform, 4(3):e25, Jul 2016. [ | | pdf]
    Objective: Our study investigates different models to forecast the total number of next-day discharges from an open ward having no real-time clinical data. Methods: We compared 5 popular regression algorithms to model total next-day discharges: (1) autoregressive integrated moving average (ARIMA), (2) the autoregressive moving average with exogenous variables (ARMAX), (3) k-nearest neighbor regression, (4) random forest regression, and (5) support vector regression. Although the autoregressive integrated moving average model relied on past 3-month discharges, nearest neighbor forecasting used median of similar discharges in the past in estimating next-day discharge. In addition, the ARMAX model used the day of the week and number of patients currently in ward as exogenous variables. For the random forest and support vector regression models, we designed a predictor set of 20 patient features and 88 ward-level features. Results: Our data consisted of 12,141 patient visits over 1826 days. Forecasting quality was measured using mean forecast error, mean absolute error, symmetric mean absolute percentage error, and root mean square error. When compared with a moving average prediction model, all 5 models demonstrated superior performance with the random forests achieving 22.7\% improvement in mean absolute error, for all days in the year 2014. Conclusions: In the absence of clinical information, our study recommends using patient-level and ward-level data in predicting next-day discharges. Random forest and support vector regression models are able to use all available features from such data, resulting in superior performance over traditional autoregressive methods. An intelligent estimate of available beds in wards plays a crucial role in relieving access block in emergency departments.
    @ARTICLE { gopakumar_etal_jmir16forecasting,
        AUTHOR = { Gopakumar, Shivapratap and Tran, Truyen and Luo, Wei and Phung, Dinh and Venkatesh, Svetha },
        JOURNAL = { JMIR Med Inform },
        TITLE = { Forecasting Daily Patient Outflow From a Ward Having No Real-Time Clinical Data },
        YEAR = { 2016 },
        MONTH = { Jul },
        NUMBER = { 3 },
        PAGES = { e25 },
        VOLUME = { 4 },
        ABSTRACT = { Objective: Our study investigates different models to forecast the total number of next-day discharges from an open ward having no real-time clinical data. Methods: We compared 5 popular regression algorithms to model total next-day discharges: (1) autoregressive integrated moving average (ARIMA), (2) the autoregressive moving average with exogenous variables (ARMAX), (3) k-nearest neighbor regression, (4) random forest regression, and (5) support vector regression. Although the autoregressive integrated moving average model relied on past 3-month discharges, nearest neighbor forecasting used median of similar discharges in the past in estimating next-day discharge. In addition, the ARMAX model used the day of the week and number of patients currently in ward as exogenous variables. For the random forest and support vector regression models, we designed a predictor set of 20 patient features and 88 ward-level features. Results: Our data consisted of 12,141 patient visits over 1826 days. Forecasting quality was measured using mean forecast error, mean absolute error, symmetric mean absolute percentage error, and root mean square error. When compared with a moving average prediction model, all 5 models demonstrated superior performance with the random forests achieving 22.7\% improvement in mean absolute error, for all days in the year 2014. Conclusions: In the absence of clinical information, our study recommends using patient-level and ward-level data in predicting next-day discharges. Random forest and support vector regression models are able to use all available features from such data, resulting in superior performance over traditional autoregressive methods. An intelligent estimate of available beds in wards plays a crucial role in relieving access block in emergency departments. },
        DAY = { 21 },
        DOI = { 10.2196/medinform.5650 },
        FILE = { :gopakumar_etal_jmir16forecasting - Forecasting Daily Patient Outflow from a Ward Having No Real Time Clinical Data.pdf:PDF },
        KEYWORDS = { patient flow },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.08.02 },
        URL = { http://medinform.jmir.org/2016/3/e25/ },
    }
J
  • Control Matching via Discharge Code Sequences
    Nguyen, Dang, Luo, Wei, Phung, Dinh and Venkatesh, Svetha. In Machine Learning for Health @ NIPS 2016, 2016. [ | ]
    In this paper, we consider the patient similarity matching problem over a cancer cohort of more than 220,000 patients. Our approach first leverages on Word2Vec framework to embed ICD codes into vector-valued representation. We then propose a sequential algorithm for case-control matching on this representation space of diagnosis codes. The novel practice of applying the sequential matching on the vector representation lifted the matching accuracy measured through multiple clinical outcomes. We reported the results on a large-scale dataset to demonstrate the effectiveness of our method. For such a large dataset where most clinical information has been codified, the new method is particularly relevant.
    @CONFERENCE { nguyen_etal_mlh16control,
        AUTHOR = { Nguyen, Dang and Luo, Wei and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { Control Matching via Discharge Code Sequences },
        BOOKTITLE = { Machine Learning for Health @ NIPS 2016 },
        YEAR = { 2016 },
        ABSTRACT = { In this paper, we consider the patient similarity matching problem over a cancer cohort of more than 220,000 patients. Our approach first leverages on Word2Vec framework to embed ICD codes into vector-valued representation. We then propose a sequential algorithm for case-control matching on this representation space of diagnosis codes. The novel practice of applying the sequential matching on the vector representation lifted the matching accuracy measured through multiple clinical outcomes. We reported the results on a large-scale dataset to demonstrate the effectiveness of our method. For such a large dataset where most clinical information has been codified, the new method is particularly relevant. },
        FILE = { :nguyen_etal_mlh16control - Control Matching Via Discharge Code Sequences.pdf:PDF },
        JOURNAL = { arXiv preprint arXiv:1612.01812 },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2017.02.06 },
    }
C
  • Effect of Social Capital on Emotion, Language Style and Latent Topics in Online Depression Community
    Dao, Bo, Nguyen, Thin, Venkatesh, Svetha and Phung, Dinh. In 12th IEEE-RIVF Intl. Conf. on Computing and Communication Technologies, Nov. 2016. (Best Runner-up Student Paper Award). [ | ]
    Social capital is linked to mental illness. It has been proposed that higher social capital is associated with better mental well-being in both individuals and groups in offline setting. However, in online settings, the association between onlinesocial capital and mental health conditions has not yet been explored. Social media offer us a rich opportunity to determine the link between social capital and aspects of mental wellbeing. In this paper, we examine social capital based on levelsof social connectivity of bloggers can be connected to aspects of depression in individuals and online depression community. We explore apparent properties of textual contents, including expressed emotions, language styles and latent topics, of a largecorpus of blog posts, to analyze the aspect of social capital in the community. Using data collected from online LiveJournal depression community, we apply both statistical tests and machine learning approaches to examine how predictive factors varybetween low and high social capital groups. Significant differences are found between low and high social capital groups when characterized by a set of latent topics, language features derived from blog posts, suggesting discriminative features, proved tobe useful in the classification task. This shows that linguistic styles are better predictors than latent topics as features. The findings indicate the potential of using social media as a sensor for monitoring mental well-being in online settings.
    @CONFERENCE { dao_etal_rivf16effect,
        AUTHOR = { Dao, Bo and Nguyen, Thin and Venkatesh, Svetha and Phung, Dinh },
        TITLE = { Effect of Social Capital on Emotion, Language Style and Latent Topics in Online Depression Community },
        BOOKTITLE = { 12th IEEE-RIVF Intl. Conf. on Computing and Communication Technologies },
        YEAR = { 2016 },
        MONTH = { Nov. },
        NOTE = { Best Runner-up Student Paper Award },
        ABSTRACT = { Social capital is linked to mental illness. It has been proposed that higher social capital is associated with better mental well-being in both individuals and groups in offline setting. However, in online settings, the association between onlinesocial capital and mental health conditions has not yet been explored. Social media offer us a rich opportunity to determine the link between social capital and aspects of mental wellbeing. In this paper, we examine social capital based on levelsof social connectivity of bloggers can be connected to aspects of depression in individuals and online depression community. We explore apparent properties of textual contents, including expressed emotions, language styles and latent topics, of a largecorpus of blog posts, to analyze the aspect of social capital in the community. Using data collected from online LiveJournal depression community, we apply both statistical tests and machine learning approaches to examine how predictive factors varybetween low and high social capital groups. Significant differences are found between low and high social capital groups when characterized by a set of latent topics, language features derived from blog posts, suggesting discriminative features, proved tobe useful in the classification task. This shows that linguistic styles are better predictors than latent topics as features. The findings indicate the potential of using social media as a sensor for monitoring mental well-being in online settings. },
        FILE = { :dao_etal_rivf16effect - Effect of Social Capital on Emotion, Language Style and Latent Topics in Online Depression Community.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.09.10 },
    }
C
  • MCNC: Multi-channel Nonparametric Clustering from Heterogeneous Data
    Nguyen, T-B., Nguyen, V., Venkatesh, S. and Phung, D.. In 23rd Intl. Conf. on Pattern Recognition (ICPR), Dec. 2016. (Finalist Best IBM Track 1 Student Paper Award). [ | ]
    Bayesian nonparametric (BNP) models have recently become popular due to its flexibility in identifying the unknown number of clusters. However, they have difficulties handling heterogeneous data from multiple sources. Existing BNP works either treat each of these sources independently -- hence do not benefit from the correlating information between them, or require to specify data sources as primary or context channels. In this paper, we present a BNP framework, termed MCNC, which has the ability to (1) discover co-patterns from multiple sources; (2) explore multi-channel data simultaneously and equally; (3) automatically identify a suitable number of patterns from data; and (4) handle missing data. The key idea is to tweak the base measure of a BNP model being a product-space. We demonstrate our framework on synthetic and real-world datasets to discover the identity--location--time (a.k.a who--where--when) patterns in two settings of complete and missing data. The experimenal results highlight the effectiveness of our MCNC in both cases of complete and missing data.
    @CONFERENCE { nguyen_nguyen_venkatesh_phung_icpr16mcnc,
        AUTHOR = { Nguyen, T-B. and Nguyen, V. and Venkatesh, S. and Phung, D. },
        TITLE = { {MCNC}: Multi-channel Nonparametric Clustering from Heterogeneous Data },
        BOOKTITLE = { 23rd Intl. Conf. on Pattern Recognition (ICPR) },
        YEAR = { 2016 },
        MONTH = { Dec. },
        NOTE = { Finalist Best IBM Track 1 Student Paper Award },
        ABSTRACT = { Bayesian nonparametric (BNP) models have recently become popular due to its flexibility in identifying the unknown number of clusters. However, they have difficulties handling heterogeneous data from multiple sources. Existing BNP works either treat each of these sources independently -- hence do not benefit from the correlating information between them, or require to specify data sources as primary or context channels. In this paper, we present a BNP framework, termed MCNC, which has the ability to (1) discover co-patterns from multiple sources; (2) explore multi-channel data simultaneously and equally; (3) automatically identify a suitable number of patterns from data; and (4) handle missing data. The key idea is to tweak the base measure of a BNP model being a product-space. We demonstrate our framework on synthetic and real-world datasets to discover the identity--location--time (a.k.a who--where--when) patterns in two settings of complete and missing data. The experimenal results highlight the effectiveness of our MCNC in both cases of complete and missing data. },
        FILE = { :nguyen_nguyen_venkatesh_phung_icpr16mcnc - MCNC_ Multi Channel Nonparametric Clustering from Heterogeneous Data.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.07.13 },
    }
C
  • Stable Clinical Prediction using Graph Support Vector Machines
    Kamkar, Iman, Gupta, Sunil, Li, Cheng, Phung, Dinh and Venkatesh, Svetha. In 23rd Intl. Conf. on Pattern Recognition (ICPR), Dec. 2016. [ | ]
    @CONFERENCE { kamkar_gupta_li_phung_venkatesh_icpr16stable,
        AUTHOR = { Kamkar, Iman and Gupta, Sunil and Li, Cheng and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { Stable Clinical Prediction using Graph {S}upport {V}ector {M}achines },
        BOOKTITLE = { 23rd Intl. Conf. on Pattern Recognition (ICPR) },
        YEAR = { 2016 },
        MONTH = { Dec. },
        FILE = { :kamkar_gupta_li_phung_venkatesh_icpr16stable - Stable Clinical Prediction Using Graph Support Vector Machines.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.07.13 },
    }
C
  • Distributed Data Augmented Support Vector Machine on Spark
    Nguyen, Tu, Nguyen, Vu, Le, Trung and Phung, Dinh. In 23rd Intl. Conf. on Pattern Recognition (ICPR), Dec. 2016. [ | ]
    @CONFERENCE { nguyen_nguyen_le_phung_icpr16distributed,
        AUTHOR = { Nguyen, Tu and Nguyen, Vu and Le, Trung and Phung, Dinh },
        TITLE = { Distributed Data Augmented {S}upport {V}ector {M}achine on {S}park },
        BOOKTITLE = { 23rd Intl. Conf. on Pattern Recognition (ICPR) },
        YEAR = { 2016 },
        MONTH = { Dec. },
        FILE = { :nguyen_nguyen_le_phung_icpr16distributed - Distributed Data Augmented Support Vector Machine on Spark.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.07.13 },
    }
C
  • Faster Training of Very Deep Networks via p-Norm Gates
    Pham, Trang, Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. In 23rd Intl. Conf. on Pattern Recognition (ICPR), Dec. 2016. [ | ]
    @CONFERENCE { pham_tran_phung_venkatesh_icpr16faster,
        AUTHOR = { Pham, Trang and Tran, Truyen and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { Faster Training of Very Deep Networks via p-Norm Gates },
        BOOKTITLE = { 23rd Intl. Conf. on Pattern Recognition (ICPR) },
        YEAR = { 2016 },
        MONTH = { Dec. },
        FILE = { :pham_tran_phung_venkatesh_icpr16faster - Faster Training of Very Deep Networks Via P Norm Gates.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.07.13 },
    }
C
  • Transfer Learning for Rare Cancer Problems via Discriminative Sparse Gaussian Graphical Model
    Saha, Budhaditya, Gupta, Sunil, Phung, Dinh and Venkatesh, Svetha. In 23rd Intl. Conf. on Pattern Recognition (ICPR), Dec. 2016. [ | ]
    @CONFERENCE { budhaditya_gupta_phung_venkatesh_icpr16transfer,
        AUTHOR = { Saha, Budhaditya and Gupta, Sunil and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { Transfer Learning for Rare Cancer Problems via Discriminative Sparse {G}aussian Graphical Model },
        BOOKTITLE = { 23rd Intl. Conf. on Pattern Recognition (ICPR) },
        YEAR = { 2016 },
        MONTH = { Dec. },
        FILE = { :budhaditya_gupta_phung_venkatesh_icpr16transfer - Transfer Learning for Rare Cancer Problems Via Discriminative Sparse Gaussian Graphical Model.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.07.13 },
    }
C
  • Model-based Classification and Novelty Detection For Point Pattern Data
    Vo, Ba-Ngu, Tran, Nhat-Quang, Phung, Dinh and Vo, Ba-Tuong. In 23rd Intl. Conf. on Pattern Recognition (ICPR), Dec. 2016. [ | ]
    @CONFERENCE { vo_tran_phung_vo_icpr16model,
        AUTHOR = { Vo, Ba-Ngu and Tran, Nhat-Quang and Phung, Dinh and Vo, Ba-Tuong },
        TITLE = { Model-based Classification and Novelty Detection For Point Pattern Data },
        BOOKTITLE = { 23rd Intl. Conf. on Pattern Recognition (ICPR) },
        YEAR = { 2016 },
        MONTH = { Dec. },
        FILE = { :vo_tran_phung_vo_icpr16model - Model Based Classification and Novelty Detection for Point Pattern Data.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.07.13 },
    }
C
  • Clustering For Point Pattern Data
    Tran, Nhat-Quang, Vo, Ba-Ngu, Phung, Dinh and Vo, Ba-Tuong. In 23rd Intl. Conf. on Pattern Recognition (ICPR), Dec. 2016. [ | ]
    @CONFERENCE { tran_vo_phung_vo_icpr16clustering,
        AUTHOR = { Tran, Nhat-Quang and Vo, Ba-Ngu and Phung, Dinh and Vo, Ba-Tuong },
        TITLE = { Clustering For Point Pattern Data },
        BOOKTITLE = { 23rd Intl. Conf. on Pattern Recognition (ICPR) },
        YEAR = { 2016 },
        MONTH = { Dec. },
        FILE = { :tran_vo_phung_vo_icpr16clustering - Clustering for Point Pattern Data.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.07.13 },
    }
C
  • Discriminative cues for different stages of smoking cessation in online community
    Nguyen, Thin, Borland, Ron, Yearwood, John, Yong, Hua, Venkatesh, Svetha and Phung, Dinh. In 17th Intl. Conf. on Web Information Systems Engineering (WISE), Nov. 2016. [ | ]
    Smoking is the largest single cause of premature mortality, being responsible for about six million deaths annually worldwide. Most smokers want to quit, but many have problems. The Internet enables people interested in quitting smoking to connect with others via online communities; however, the characteristics of these discussions are not well understood. This work aims to explore the textual cues of an online community interested in quitting smoking: www.reddit.com/r/stopsmoking -- “a place for redditors to motivate each other to quit smoking”. A large corpus of data was crawled including thousand posts made by thousand users within the community. Four subgroups of posts based on the cessation days of abstainers were defined: S0: within the first week, S1: within the first month (excluding cohort S0), S2: from second month to one year, and S3: beyond one year. Psycho-linguistic features and content topics were extracted from the posts and analysed. Machine learning techniques were used to discriminate the online conversations in the first week S0 from the other subgroups. Topics and psycho-linguistic features were found to be highly valid predictors of the subgroups. Clear discrimination between linguistic features and topics, alongside good predictive power is an important step in understanding social media and its use in studies of smoking and other addictions in online settings.
    @INPROCEEDINGS { nguyen_etal_wise16discriminative,
        AUTHOR = { Nguyen, Thin and Borland, Ron and Yearwood, John and Yong, Hua and Venkatesh, Svetha and Phung, Dinh },
        TITLE = { Discriminative cues for different stages of smoking cessation in online community },
        BOOKTITLE = { 17th Intl. Conf. on Web Information Systems Engineering (WISE) },
        YEAR = { 2016 },
        SERIES = { Lecture Notes in Computer Science },
        MONTH = { Nov. },
        PUBLISHER = { Springer International Publishing },
        ABSTRACT = { Smoking is the largest single cause of premature mortality, being responsible for about six million deaths annually worldwide. Most smokers want to quit, but many have problems. The Internet enables people interested in quitting smoking to connect with others via online communities; however, the characteristics of these discussions are not well understood. This work aims to explore the textual cues of an online community interested in quitting smoking: www.reddit.com/r/stopsmoking -- “a place for redditors to motivate each other to quit smoking”. A large corpus of data was crawled including thousand posts made by thousand users within the community. Four subgroups of posts based on the cessation days of abstainers were defined: S0: within the first week, S1: within the first month (excluding cohort S0), S2: from second month to one year, and S3: beyond one year. Psycho-linguistic features and content topics were extracted from the posts and analysed. Machine learning techniques were used to discriminate the online conversations in the first week S0 from the other subgroups. Topics and psycho-linguistic features were found to be highly valid predictors of the subgroups. Clear discrimination between linguistic features and topics, alongside good predictive power is an important step in understanding social media and its use in studies of smoking and other addictions in online settings. },
        FILE = { :nguyen_etal_wise16discriminative - Discriminative Cues for Different Stages of Smoking Cessation in Online Community.pdf:PDF },
        LANGUAGE = { English },
        OWNER = { thinng },
        TIMESTAMP = { 2016.07.14 },
    }
C
  • Large-scale stylistic analysis of formality in academia and social media
    Nguyen, Thin, Venkatesh, Svetha and Phung, Dinh. In 17th Intl. Conf. on Web Information Systems Engineering (WISE), Nov. 2016. [ | ]
    The dictum `publish or perish' has influenced the way scientists present research results as to get published, including exaggeration and overstatement of research findings. This behavior emerges patterns of using language in academia. For example, recently it has been found that the proportion of positive words has risen in the content of scientific articles over the last 40 years, which probably shows the tendency in scientists to exaggerate and overstate their research results. The practice may deviate from impersonal and formal style of academic writing. In this study the degree of formality in scientific articles is investigated through a corpus of 14 million PubMed abstracts. Three aspects of stylistic features are explored: expressing emotional information, using first person pronouns to refer to the authors, and mixing English varieties. The aspects are compared with that of online user-generated media, including online encyclopedias, web-logs, forums, and micro-blogs. Trends of these stylistic features in scientific publications for the last four decades are also discovered. Advances in cluster computing are employed to process large scale data, with 5.8 terabytes and 3.6 billions of data points from all the media. The results suggest the potential of pattern recognition in data at scale.
    @INPROCEEDINGS { nguyen_etal_wise16LargeScale,
        AUTHOR = { Nguyen, Thin and Venkatesh, Svetha and Phung, Dinh },
        TITLE = { Large-scale stylistic analysis of formality in academia and social media },
        BOOKTITLE = { 17th Intl. Conf. on Web Information Systems Engineering (WISE) },
        YEAR = { 2016 },
        SERIES = { Lecture Notes in Computer Science },
        MONTH = { Nov. },
        PUBLISHER = { Springer International Publishing },
        ABSTRACT = { The dictum `publish or perish' has influenced the way scientists present research results as to get published, including exaggeration and overstatement of research findings. This behavior emerges patterns of using language in academia. For example, recently it has been found that the proportion of positive words has risen in the content of scientific articles over the last 40 years, which probably shows the tendency in scientists to exaggerate and overstate their research results. The practice may deviate from impersonal and formal style of academic writing. In this study the degree of formality in scientific articles is investigated through a corpus of 14 million PubMed abstracts. Three aspects of stylistic features are explored: expressing emotional information, using first person pronouns to refer to the authors, and mixing English varieties. The aspects are compared with that of online user-generated media, including online encyclopedias, web-logs, forums, and micro-blogs. Trends of these stylistic features in scientific publications for the last four decades are also discovered. Advances in cluster computing are employed to process large scale data, with 5.8 terabytes and 3.6 billions of data points from all the media. The results suggest the potential of pattern recognition in data at scale. },
        FILE = { :nguyen_etal_wise16LargeScale - Large Scale Stylistic Analysis of Formality in Academia and Social Media.pdf:PDF },
        LANGUAGE = { English },
        OWNER = { thinng },
        TIMESTAMP = { 2016.07.14 },
    }
C
  • Learning Multifaceted Latent Activities from Heterogeneous Mobile Data
    Nguyen, Thanh-Binh, Nguyen, Vu, Nguyen, Thuong, Venkatesh, Svetha, Kumar, Mohan and Phung, Dinh. In 3rd Intl. Conf. on Data Science and Advanced Analytics (DSAA), Oct. 2016. [ | ]
    Inferring abstract contexts and activities from heterogeneous data is vital to context-aware ubiquitous applications but still remains one of the most challenging problems. Recent advances in Bayesian nonparametric machine learning, in particular the theory of topic models based on Hierarchical Dirichlet Process (HDP), has provided an elegant solution towards these challenges. However, none of existing methods has addressed the problem of inferring latent multifaceted activities and contexts from heterogeneous data sources such as those collected from mobile devices. In this paper, we extend the original HDP to model heterogeneous data using a richer structure of the base measure being a product-space. The proposed model, called product-space HDP (PS-HDP), naturally handles the heterogeneous data from multiple sources and identify the unknown number of latent structures in a principle way. Although this framework is generic, our current work primarily focuses on inferring (latent) threefold activities of who-when-where simultaneously, which corresponds to inducing activities from data collected for identity, location and time. We demonstrate our model on synthetic data as well as on a real-world dataset – the StudenfLife dataset. We report results and provide analysis on the discovered activities and patterns to demonstrate the merit of the model. We also quantitatively evaluate the performance of PS-HDP model using standard metrics including F1-score, NMI, RI, purity, and compare them with well-known existing baseline methods.
    @INPROCEEDINGS { nguyen_etal_dsaa16learning,
        AUTHOR = { Nguyen, Thanh-Binh and Nguyen, Vu and Nguyen, Thuong and Venkatesh, Svetha and Kumar, Mohan and Phung, Dinh },
        TITLE = { Learning Multifaceted Latent Activities from Heterogeneous Mobile Data },
        BOOKTITLE = { 3rd Intl. Conf. on Data Science and Advanced Analytics (DSAA) },
        YEAR = { 2016 },
        MONTH = { Oct. },
        ABSTRACT = { Inferring abstract contexts and activities from heterogeneous data is vital to context-aware ubiquitous applications but still remains one of the most challenging problems. Recent advances in Bayesian nonparametric machine learning, in particular the theory of topic models based on Hierarchical Dirichlet Process (HDP), has provided an elegant solution towards these challenges. However, none of existing methods has addressed the problem of inferring latent multifaceted activities and contexts from heterogeneous data sources such as those collected from mobile devices. In this paper, we extend the original HDP to model heterogeneous data using a richer structure of the base measure being a product-space. The proposed model, called product-space HDP (PS-HDP), naturally handles the heterogeneous data from multiple sources and identify the unknown number of latent structures in a principle way. Although this framework is generic, our current work primarily focuses on inferring (latent) threefold activities of who-when-where simultaneously, which corresponds to inducing activities from data collected for identity, location and time. We demonstrate our model on synthetic data as well as on a real-world dataset – the StudenfLife dataset. We report results and provide analysis on the discovered activities and patterns to demonstrate the merit of the model. We also quantitatively evaluate the performance of PS-HDP model using standard metrics including F1-score, NMI, RI, purity, and compare them with well-known existing baseline methods. },
        FILE = { :nguyen_etal_dsaa16learning - Learning Multifaceted Latent Activities from Heterogeneous Mobile Data.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.08.01 },
    }
C
  • Analysing the History of Autism Spectrum Disorder using Topic Models
    Beykikhoshk, Adham, Arandjelovi\'{c}, Ognjen, Venkatesh, Svetha and Phung, Dinh. In 3rd Intl. Conf. on Data Science and Advanced Analytics (DSAA), Oct. 2016. [ | ]
    We describe a novel framework for the discovery of underlying topics of a longitudinal collection of scholarly data, and the tracking of their lifetime and popularity over time. Unlike the social media or news data, as the topic nuances in science result in new scientific directions to emerge, a new approach to model the longitudinal literature data is using topics which remain identifiable over the course of time. Current studies either disregard the time dimension or treat it as an exchangeable covariate when they fix the topics over time or do not share the topics over epochs when they model the time naturally. We address these issues by adopting a non-parametric Bayesian approach. We assume the data is partially exchangeable and divide it into consecutive epochs. Then, by fixing the topics in a recurrent Chinese restaurant franchise, we impose a static topical structure on the corpus such that the they are shared across epochs and the documents within epochs. We demonstrate the effectiveness of the proposed framework on a collection of medical literature related to autism spectrum disorder. We collect a large corpus of publications and carefully examining two important research issues of the domain as case studies. Moreover, we make the results of our experiment and the source code of the model, freely available to aid other researchers by analysing the results or applying the model to their data collections.
    @INPROCEEDINGS { beykikhoshk_etal_dsaa16analysing,
        AUTHOR = { Beykikhoshk, Adham and Arandjelovi\'{c}, Ognjen and Venkatesh, Svetha and Phung, Dinh },
        TITLE = { Analysing the History of Autism Spectrum Disorder using Topic Models },
        BOOKTITLE = { 3rd Intl. Conf. on Data Science and Advanced Analytics (DSAA) },
        YEAR = { 2016 },
        MONTH = { Oct. },
        ABSTRACT = { We describe a novel framework for the discovery of underlying topics of a longitudinal collection of scholarly data, and the tracking of their lifetime and popularity over time. Unlike the social media or news data, as the topic nuances in science result in new scientific directions to emerge, a new approach to model the longitudinal literature data is using topics which remain identifiable over the course of time. Current studies either disregard the time dimension or treat it as an exchangeable covariate when they fix the topics over time or do not share the topics over epochs when they model the time naturally. We address these issues by adopting a non-parametric Bayesian approach. We assume the data is partially exchangeable and divide it into consecutive epochs. Then, by fixing the topics in a recurrent Chinese restaurant franchise, we impose a static topical structure on the corpus such that the they are shared across epochs and the documents within epochs. We demonstrate the effectiveness of the proposed framework on a collection of medical literature related to autism spectrum disorder. We collect a large corpus of publications and carefully examining two important research issues of the domain as case studies. Moreover, we make the results of our experiment and the source code of the model, freely available to aid other researchers by analysing the results or applying the model to their data collections. },
        FILE = { :beykikhoshk_etal_dsaa16analysing - Analysing the History of Autism Spectrum Disorder Using Topic Models.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.08.01 },
    }
C
  • A Framework for Mixed-type Multi-outcome Prediction with Applications in Healthcare
    Saha, Budhaditya, Gupta, Sunil, Phung, Dinh and Venkatesh, Svetha. IEEE Journal of Biomedical and Health Informatics (JBHI), July 2016. [ | ]
    @ARTICLE { budhaditya_gupta_phung_venkatesh_jbhi16framework,
        AUTHOR = { Saha, Budhaditya and Gupta, Sunil and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { A Framework for Mixed-type Multi-outcome Prediction with Applications in Healthcare },
        JOURNAL = { IEEE Journal of Biomedical and Health Informatics (JBHI) },
        YEAR = { 2016 },
        MONTH = { July },
        FILE = { :budhaditya_gupta_phung_venkatesh_jbhi16framework - A Framework for Mixed Type Multi Outcome Prediction with Applications in Healthcare.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.07.13 },
    }
J
  • Discovering Latent Affective Transitions among Individuals in Online Mental Health­related Communities.
    Dao, Bo, Thin Nguyen, Venkatesh, Svetha and Phung, Dinh. In IEEE Intl. Conf. on Multimedia and Expo (ICME), Seatle, USA, July 2016. [ | ]
    The discovery of latent affective patterns of individuals with affective disorders will potentially enhance the diagnosis and treatment of mental disorders. This paper studies the phenomena of affective transitions among individuals in online mental health communities. We apply non-negative matrix factorization model to extract the common and individual factors of affective transitions across groups of individuals in different levels of affective disorders. We examine the latent patterns of emotional transitions and investigate the effects of emotional transitions across the cohorts. We establish a novel framework of utilizing social media as sensors of mood and emotional transitions. This work might suggest the base of new systems to screen individuals and communities at high risks of mental health problems in online settings.
    @INPROCEEDINGS { dao_nguyen_venkatesh_phung_icme16,
        AUTHOR = { Dao, Bo and Thin Nguyen and Venkatesh, Svetha and Phung, Dinh },
        TITLE = { Discovering Latent Affective Transitions among Individuals in Online Mental Health­related Communities. },
        BOOKTITLE = { IEEE Intl. Conf. on Multimedia and Expo (ICME) },
        YEAR = { 2016 },
        ADDRESS = { Seatle, USA },
        MONTH = { July },
        PUBLISHER = { IEEE },
        ABSTRACT = { The discovery of latent affective patterns of individuals with affective disorders will potentially enhance the diagnosis and treatment of mental disorders. This paper studies the phenomena of affective transitions among individuals in online mental health communities. We apply non-negative matrix factorization model to extract the common and individual factors of affective transitions across groups of individuals in different levels of affective disorders. We examine the latent patterns of emotional transitions and investigate the effects of emotional transitions across the cohorts. We establish a novel framework of utilizing social media as sensors of mood and emotional transitions. This work might suggest the base of new systems to screen individuals and communities at high risks of mental health problems in online settings. },
        FILE = { :dao_nguyen_venkatesh_phung_icme16 - Discovering Latent Affective Transitions among Individuals in Online Mental Health­related Communities..pdf:PDF },
        OWNER = { dbdao },
        TIMESTAMP = { 2016.03.20 },
    }
C
  • Hierarchical Bayesian nonparametric models for knowledge discovery from electronic medical records
    Li, Cheng, Rana, Santu, Phung, Dinh and Venkatesh, Svetha. Knowledge-Based Systems (KBS), 99(1):168 - 182, May 2016. [ | | pdf]
    Electronic Medical Record (EMR) has established itself as a valuable resource for large scale analysis of health data. A hospital \{EMR\} dataset typically consists of medical records of hospitalized patients. A medical record contains diagnostic information (diagnosis codes), procedures performed (procedure codes) and admission details. Traditional topic models, such as latent Dirichlet allocation (LDA) and hierarchical Dirichlet process (HDP), can be employed to discover disease topics from \{EMR\} data by treating patients as documents and diagnosis codes as words. This topic modeling helps to understand the constitution of patient diseases and offers a tool for better planning of treatment. In this paper, we propose a novel and flexible hierarchical Bayesian nonparametric model, the word distance dependent Chinese restaurant franchise (wddCRF), which incorporates word-to-word distances to discover semantically-coherent disease topics. We are motivated by the fact that diagnosis codes are connected in the form of ICD-10 tree structure which presents semantic relationships between codes. We exploit a decay function to incorporate distances between words at the bottom level of wddCRF. Efficient inference is derived for the wddCRF by using \{MCMC\} technique. Furthermore, since procedure codes are often correlated with diagnosis codes, we develop the correspondence wddCRF (Corr-wddCRF) to explore conditional relationships of procedure codes for a given disease pattern. Efficient collapsed Gibbs sampling is derived for the Corr-wddCRF. We evaluate the proposed models on two real-world medical datasets – PolyVascular disease and Acute Myocardial Infarction disease. We demonstrate that the Corr-wddCRF model discovers more coherent topics than the Corr-HDP. We also use disease topic proportions as new features and show that using features from the Corr-wddCRF outperforms the baselines on 14-days readmission prediction. Beside these, the prediction for procedure codes based on the Corr-wddCRF also shows considerable accuracy.
    @ARTICLE { li_rana_phung_venkatesh_kbs16hierarchical,
        AUTHOR = { Li, Cheng and Rana, Santu and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { Hierarchical {B}ayesian nonparametric models for knowledge discovery from electronic medical records },
        JOURNAL = { Knowledge-Based Systems (KBS) },
        YEAR = { 2016 },
        VOLUME = { 99 },
        NUMBER = { 1 },
        PAGES = { 168 - 182 },
        MONTH = { May },
        ISSN = { 0950-7051 },
        ABSTRACT = { Electronic Medical Record (EMR) has established itself as a valuable resource for large scale analysis of health data. A hospital \{EMR\} dataset typically consists of medical records of hospitalized patients. A medical record contains diagnostic information (diagnosis codes), procedures performed (procedure codes) and admission details. Traditional topic models, such as latent Dirichlet allocation (LDA) and hierarchical Dirichlet process (HDP), can be employed to discover disease topics from \{EMR\} data by treating patients as documents and diagnosis codes as words. This topic modeling helps to understand the constitution of patient diseases and offers a tool for better planning of treatment. In this paper, we propose a novel and flexible hierarchical Bayesian nonparametric model, the word distance dependent Chinese restaurant franchise (wddCRF), which incorporates word-to-word distances to discover semantically-coherent disease topics. We are motivated by the fact that diagnosis codes are connected in the form of ICD-10 tree structure which presents semantic relationships between codes. We exploit a decay function to incorporate distances between words at the bottom level of wddCRF. Efficient inference is derived for the wddCRF by using \{MCMC\} technique. Furthermore, since procedure codes are often correlated with diagnosis codes, we develop the correspondence wddCRF (Corr-wddCRF) to explore conditional relationships of procedure codes for a given disease pattern. Efficient collapsed Gibbs sampling is derived for the Corr-wddCRF. We evaluate the proposed models on two real-world medical datasets – PolyVascular disease and Acute Myocardial Infarction disease. We demonstrate that the Corr-wddCRF model discovers more coherent topics than the Corr-HDP. We also use disease topic proportions as new features and show that using features from the Corr-wddCRF outperforms the baselines on 14-days readmission prediction. Beside these, the prediction for procedure codes based on the Corr-wddCRF also shows considerable accuracy. },
        DOI = { http://dx.doi.org/10.1016/j.knosys.2016.02.005 },
        FILE = { :li_rana_phung_venkatesh_kbs16hierarchical - Hierarchical Bayesian Nonparametric Models for Knowledge Discovery from Electronic Medical Records.pdf:PDF },
        KEYWORDS = { Bayesian nonparametric models; Correspondence models; Word distances; Disease topics; Readmission prediction; Procedure codes prediction },
        URL = { http://www.sciencedirect.com/science/article/pii/S0950705116000836 },
    }
J
  • Learning Multi-faceted Activities from Heterogeneous Data with the Product Space Hierarchical Dirichlet Processes
    Nguyen, T-B., Nguyen, V., Venkatesh, S. and Phung, D.. In 3rd PAKDD Workshop on Machine Learning for Sensory Data Analysis (MLSDA), pages 128-140, April 2016. [ | ]
    Hierarchical Dirichlet processes (HDP) was originally designed and experimented for a single data channel. In this paper we enhanced its ability to model heterogeneous data using a richer structure for the base measure being a product-space. The enhanced model, called Product Space HDP (PS-HDP), can (1) simultaneously model heterogeneous data from multiple sources in a Bayesian nonparametric framework, hence inherit its strengths and advantages including the ability to automatically grow the model complexity and (2) discover multilevel latent structures from data to result in different types of topics/latent structures that can be explained jointly. We experimented with the MDC dataset, a large and real-world data collected from mobile phones. Our goal was to discover identity--location--time (a.k.a who-where-when) patterns at different levels (globally for all groups and locally for each group). We provided analysis on the activities and patterns learned from our model, visualized, compared and contrasted with the ground-truth to demonstrate the merit of the proposed framework. We further quantitatively evaluated and reported its performance using standard metrics including F1-score, NMI, RI, and purity. We also compared the performance of the PS-HDP model with those of popular existing clustering methods (including K-Means, NNMF, GMM, DP-Means, and AP). Lastly, we demonstrate the ability of the model in learning activities with missing data, a common problem encountered in pervasive and ubiquitous computing applications.
    @INPROCEEDINGS { nguyen_nguyen_venkatesh_phung_mlsda16learning,
        AUTHOR = { Nguyen, T-B. and Nguyen, V. and Venkatesh, S. and Phung, D. },
        TITLE = { Learning Multi-faceted Activities from Heterogeneous Data with the Product Space Hierarchical {D}irichlet Processes },
        BOOKTITLE = { 3rd PAKDD Workshop on Machine Learning for Sensory Data Analysis (MLSDA) },
        YEAR = { 2016 },
        PAGES = { 128--140 },
        MONTH = { April },
        ABSTRACT = { Hierarchical Dirichlet processes (HDP) was originally designed and experimented for a single data channel. In this paper we enhanced its ability to model heterogeneous data using a richer structure for the base measure being a product-space. The enhanced model, called Product Space HDP (PS-HDP), can (1) simultaneously model heterogeneous data from multiple sources in a Bayesian nonparametric framework, hence inherit its strengths and advantages including the ability to automatically grow the model complexity and (2) discover multilevel latent structures from data to result in different types of topics/latent structures that can be explained jointly. We experimented with the MDC dataset, a large and real-world data collected from mobile phones. Our goal was to discover identity--location--time (a.k.a who-where-when) patterns at different levels (globally for all groups and locally for each group). We provided analysis on the activities and patterns learned from our model, visualized, compared and contrasted with the ground-truth to demonstrate the merit of the proposed framework. We further quantitatively evaluated and reported its performance using standard metrics including F1-score, NMI, RI, and purity. We also compared the performance of the PS-HDP model with those of popular existing clustering methods (including K-Means, NNMF, GMM, DP-Means, and AP). Lastly, we demonstrate the ability of the model in learning activities with missing data, a common problem encountered in pervasive and ubiquitous computing applications. },
        FILE = { :nguyen_nguyen_venkatesh_phung_mlsda16learning - Learning Multi Faceted Activities from Heterogeneous Data with the Product Space Hierarchical Dirichlet Processes.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.04.06 },
    }
C
  • Neural Choice by Elimination via Highway Networks
    Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. In 5th PAKDD Workshop on Biologically Inspired Data Mining Techniques, April 2016. [ | ]
    We introduce Neural Choice by Elimination, a new framework that integrates deep neural networks into probabilistic sequential choice models for learning to rank. Given a set of items to chose from, the elimination strategy starts with the whole item set and iteratively eliminates the least worthy item in the remaining subset. We prove that the choice by elimination is equivalent to marginalizing out the random Gompertz latent utilities. Coupled with the choice model is the recently introduced Neural Highway Networks for approximating arbitrarily complex rank functions. We evaluate the proposed framework on a large-scale public dataset with over 425K items, drawn from the Yahoo! learning to rank challenge. It is demonstrated that the proposed method is competitive against state-of-the-art learning to rank methods.
    @INPROCEEDINGS { tran_phung_venkatesh_bmd16neural,
        AUTHOR = { Tran, Truyen and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { Neural Choice by Elimination via Highway Networks },
        BOOKTITLE = { 5th PAKDD Workshop on Biologically Inspired Data Mining Techniques },
        YEAR = { 2016 },
        MONTH = { April },
        ABSTRACT = { We introduce Neural Choice by Elimination, a new framework that integrates deep neural networks into probabilistic sequential choice models for learning to rank. Given a set of items to chose from, the elimination strategy starts with the whole item set and iteratively eliminates the least worthy item in the remaining subset. We prove that the choice by elimination is equivalent to marginalizing out the random Gompertz latent utilities. Coupled with the choice model is the recently introduced Neural Highway Networks for approximating arbitrarily complex rank functions. We evaluate the proposed framework on a large-scale public dataset with over 425K items, drawn from the Yahoo! learning to rank challenge. It is demonstrated that the proposed method is competitive against state-of-the-art learning to rank methods. },
        FILE = { :tran_phung_venkatesh_bmd16neural - Neural Choice by Elimination Via Highway Networks.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.04.06 },
    }
C
  • DeepCare: A Deep Dynamic Memory Model for Predictive Medicine
    Pham, Trang, Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. In Pacific Asia Knowledge Discovery and Data Mining Conference (PAKDD), pages 30-41, April 2016. [ | | pdf]
    Personalized predictive medicine necessitates modeling of patient illness and care processes, which inherently have long-term temporal dependencies. Healthcare observations, recorded in electronic medical records, are episodic and irregular in time. We introduce DeepCare, a deep dynamic neural network that reads medical records and predicts future medical outcomes. At the data level, DeepCare models patient health state trajectories with explicit memory of illness. Built on Long Short-Term Memory (LSTM), DeepCare introduces time parameterizations to handle irregular timing by moderating the forgetting and consolidation of illness memory. DeepCare also incorporates medical interventions that change the course of illness and shape future medical risk. Moving up to the health state level, historical and present health states are then aggregated through multiscale temporal pooling, before passing through a neural network that estimates future outcomes. We demonstrate the efficacy of DeepCare for disease progression modeling and readmission prediction in diabetes, a chronic disease with large economic burden. The results show improved modeling and risk prediction accuracy.
    @CONFERENCE { pham_tran_phung_venkatesh_pakdd16deepcare,
        AUTHOR = { Pham, Trang and Tran, Truyen and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { {DeepCare}: A Deep Dynamic Memory Model for Predictive Medicine },
        BOOKTITLE = { Pacific Asia Knowledge Discovery and Data Mining Conference (PAKDD) },
        YEAR = { 2016 },
        VOLUME = { 9652 },
        SERIES = { Lecture Notes in Computer Science },
        PAGES = { 30--41 },
        MONTH = { April },
        PUBLISHER = { Springer International Publishing },
        ABSTRACT = { Personalized predictive medicine necessitates modeling of patient illness and care processes, which inherently have long-term temporal dependencies. Healthcare observations, recorded in electronic medical records, are episodic and irregular in time. We introduce DeepCare, a deep dynamic neural network that reads medical records and predicts future medical outcomes. At the data level, DeepCare models patient health state trajectories with explicit memory of illness. Built on Long Short-Term Memory (LSTM), DeepCare introduces time parameterizations to handle irregular timing by moderating the forgetting and consolidation of illness memory. DeepCare also incorporates medical interventions that change the course of illness and shape future medical risk. Moving up to the health state level, historical and present health states are then aggregated through multiscale temporal pooling, before passing through a neural network that estimates future outcomes. We demonstrate the efficacy of DeepCare for disease progression modeling and readmission prediction in diabetes, a chronic disease with large economic burden. The results show improved modeling and risk prediction accuracy. },
        DOI = { 10.1007/978-3-319-31750-2_3 },
        FILE = { :pham_tran_phung_venkatesh_pakdd16deepcare - DeepCare_ a Deep Dynamic Memory Model for Predictive Medicine.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.04.06 },
        URL = { http://link.springer.com/chapter/10.1007/978-3-319-31750-2_3 },
    }
C
  • Sparse Adaptive Multi-Hyperplane Machine
    Nguyen, Khanh, Le, Trung, Nguyen, Vu and Phung, Dinh. In Pacific Asia Knowledge Discovery and Data Mining Conference (PAKDD), pages 27-39, April 2016. [ | | pdf]
    The Adaptive Multiple-hyperplane Machine (AMM) was recently proposed to deal with large-scale datasets. However, it has no principle to tune the complexity and sparsity levels of the solution. Addressing the sparsity is important to improve learning generalization, prediction accuracy and computational speedup. In this paper, we employ the max-margin principle and sparse approach to propose a new Sparse AMM (SAMM). We solve the new optimization objective function with stochastic gradient descent (SGD). Besides inheriting the good features of SGD-based learning method and the original AMM, our proposed Sparse AMM provides machinery and flexibility to tune the complexity and sparsity of the solution, making it possible to avoid overfitting and underfitting. We validate our approach on several large benchmark datasets. We show that with the ability to control sparsity, the proposed Sparse AMM yields superior classification accuracy to the original AMM while simultaneously achieving computational speedup.
    @CONFERENCE { nguyen_le_nguyen_phung_pakdd16sparse,
        AUTHOR = { Nguyen, Khanh and Le, Trung and Nguyen, Vu and Phung, Dinh },
        TITLE = { Sparse Adaptive Multi-Hyperplane Machine },
        BOOKTITLE = { Pacific Asia Knowledge Discovery and Data Mining Conference (PAKDD) },
        YEAR = { 2016 },
        VOLUME = { 9651 },
        SERIES = { Lecture Notes in Computer Science },
        PAGES = { 27--39 },
        MONTH = { April },
        PUBLISHER = { Springer International Publishing },
        ABSTRACT = { The Adaptive Multiple-hyperplane Machine (AMM) was recently proposed to deal with large-scale datasets. However, it has no principle to tune the complexity and sparsity levels of the solution. Addressing the sparsity is important to improve learning generalization, prediction accuracy and computational speedup. In this paper, we employ the max-margin principle and sparse approach to propose a new Sparse AMM (SAMM). We solve the new optimization objective function with stochastic gradient descent (SGD). Besides inheriting the good features of SGD-based learning method and the original AMM, our proposed Sparse AMM provides machinery and flexibility to tune the complexity and sparsity of the solution, making it possible to avoid overfitting and underfitting. We validate our approach on several large benchmark datasets. We show that with the ability to control sparsity, the proposed Sparse AMM yields superior classification accuracy to the original AMM while simultaneously achieving computational speedup. },
        DOI = { 10.1007/978-3-319-31753-3_3 },
        FILE = { :nguyen_le_nguyen_phung_pakdd16sparse - Sparse Adaptive Multi Hyperplane Machine.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.04.06 },
        URL = { http://link.springer.com/chapter/10.1007/978-3-319-31753-3_3 },
    }
C
  • Toxicity Prediction in Cancer Using Multiple Instance Learning in a Multi-Task Framework
    Li, Cheng, Gupta, Sunil, Rana, Santu, Luo, Wei, Venkatesh, Svetha, Ashely, David and Phung, Dinh. In Pacific Asia Knowledge Discovery and Data Mining Conference (PAKDD), pages 152-164, April 2016. [ | | pdf]
    Treatments of cancer cause severe side effects called toxicities. Reduction of such effects is crucial in cancer care. To impact care, we need to predict toxicities at fortnightly intervals. This toxicity data differs from traditional time series data as toxicities can be caused by one treatment on a given day alone, and thus it is necessary to consider the effect of the singular data vector causing toxicity. We model the data before prediction points using the multiple instance learning, where each bag is composed of multiple instances associated with daily treatments and patient-specific attributes, such as chemotherapy, radiotherapy, age and cancer types. We then formulate a Bayesian multi-task framework to enhance toxicity prediction at each prediction point. The use of the prior allows factors to be shared across task predictors. Our proposed method simultaneously captures the heterogeneity of daily treatments and performs toxicity prediction at different prediction points. Our method was evaluated on a real-word dataset of more than 2000 cancer patients and had achieved a better prediction accuracy in terms of AUC than the state-of-art baselines.
    @CONFERENCE { li_gupta_rana_luo_venkatesh_ashley_phung_pakdd16toxicity,
        AUTHOR = { Li, Cheng and Gupta, Sunil and Rana, Santu and Luo, Wei and Venkatesh, Svetha and Ashely, David and Phung, Dinh },
        TITLE = { Toxicity Prediction in Cancer Using Multiple Instance Learning in a Multi-Task Framework },
        BOOKTITLE = { Pacific Asia Knowledge Discovery and Data Mining Conference (PAKDD) },
        YEAR = { 2016 },
        PAGES = { 152--164 },
        MONTH = { April },
        PUBLISHER = { Springer },
        ABSTRACT = { Treatments of cancer cause severe side effects called toxicities. Reduction of such effects is crucial in cancer care. To impact care, we need to predict toxicities at fortnightly intervals. This toxicity data differs from traditional time series data as toxicities can be caused by one treatment on a given day alone, and thus it is necessary to consider the effect of the singular data vector causing toxicity. We model the data before prediction points using the multiple instance learning, where each bag is composed of multiple instances associated with daily treatments and patient-specific attributes, such as chemotherapy, radiotherapy, age and cancer types. We then formulate a Bayesian multi-task framework to enhance toxicity prediction at each prediction point. The use of the prior allows factors to be shared across task predictors. Our proposed method simultaneously captures the heterogeneity of daily treatments and performs toxicity prediction at different prediction points. Our method was evaluated on a real-word dataset of more than 2000 cancer patients and had achieved a better prediction accuracy in terms of AUC than the state-of-art baselines. },
        DOI = { 10.​1007/​978-3-319-31753-3_​13 },
        FILE = { :li_gupta_rana_luo_venkatesh_ashley_phung_pakdd16toxicity - Toxicity Prediction in Cancer Using Multiple Instance Learning in a Multi Task Framework.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.04.06 },
        URL = { http://link.springer.com/chapter/10.1007/978-3-319-31753-3_13 },
    }
C
  • Modelling Human Preferences for Ranking and Collaborative Filtering: A Probabilistic Ordered Partition Approach
    Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. Knowledge and Information Systems (KAIS), 47(1):157-188, April 2016. [ | | pdf]
    @ARTICLE { tran_phung_venkatesh_kais16,
        AUTHOR = { Tran, Truyen and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { Modelling Human Preferences for Ranking and Collaborative Filtering: A Probabilistic Ordered Partition Approach },
        JOURNAL = { Knowledge and Information Systems (KAIS) },
        YEAR = { 2016 },
        VOLUME = { 47 },
        NUMBER = { 1 },
        PAGES = { 157--188 },
        MONTH = { April },
        DOI = { 10.1007/s10115-015-0840-9 },
        FILE = { :tran_phung_venkatesh_kais16 - Modelling Human Preferences for Ranking and Collaborative Filtering_ a Probabilistic Ordered Partition Approach.pdf:PDF },
        KEYWORDS = { Preference learning Learning-to-rank Collaborative filtering Probabilistic ordered partition model Set-based ranking Probabilistic reasoning },
        OWNER = { Dinh },
        TIMESTAMP = { 2015.03.02 },
        URL = { http://link.springer.com/article/10.1007%2Fs10115-015-0840-9 },
    }
J
  • Consistency of the Health of the Nation Outcome Scales (HoNOS) at inpatient-to-community transition
    Luo, Wei, Harvey, Richard, Tran, Truyen, Phung, Dinh, Venkatesh, Svetha and Connor, Jason P. BMJ open, 6(4):e010732, April 2016. [ | | pdf]
    Objectives The Health of the Nation Outcome Scales (HoNOS) are mandated outcome-measures in many mental-health jurisdictions. When HoNOS are used in different care settings, it is important to assess if setting specific bias exists. This article examines the consistency of HoNOS in a sample of psychiatric patients transitioned from acute inpatient care and community centres.Setting A regional mental health service with both acute and community facilities.Participants 111 psychiatric patients were transferred from inpatient care to community care from 2012 to 2014. Their HoNOS scores were extracted from a clinical database; Each inpatient-discharge assessment was followed by a community-intake assessment, with the median period between assessments being 4 days (range 0–14). Assessor experience and professional background were recorded.Primary and secondary outcome measures The difference of HoNOS at inpatient-discharge and community-intake were assessed with Pearson correlation, Cohen's κ and effect size.Results Inpatient-discharge HoNOS was on average lower than community-intake HoNOS. The average HoNOS was 8.05 at discharge (median 7, range 1–22), and 12.16 at intake (median 12, range 1–25), an average increase of 4.11 (SD 6.97). Pearson correlation between two total scores was 0.073 (95% CI −0.095 to 0.238) and Cohen's κ was 0.02 (95% CI −0.02 to 0.06). Differences did not appear to depend on assessor experience or professional background.Conclusions Systematic change in the HoNOS occurs at inpatient-to-community transition. Some caution should be exercised in making direct comparisons between inpatient HoNOS and community HoNOS scores.
    @ARTICLE { luo_harvey_tran_phung_venkatesh_connor_bmj16consistency,
        AUTHOR = { Luo, Wei and Harvey, Richard and Tran, Truyen and Phung, Dinh and Venkatesh, Svetha and Connor, Jason P },
        TITLE = { Consistency of the Health of the Nation Outcome Scales ({HoNOS}) at inpatient-to-community transition },
        JOURNAL = { BMJ open },
        YEAR = { 2016 },
        VOLUME = { 6 },
        NUMBER = { 4 },
        PAGES = { e010732 },
        MONTH = { April },
        ABSTRACT = { Objectives The Health of the Nation Outcome Scales (HoNOS) are mandated outcome-measures in many mental-health jurisdictions. When HoNOS are used in different care settings, it is important to assess if setting specific bias exists. This article examines the consistency of HoNOS in a sample of psychiatric patients transitioned from acute inpatient care and community centres.Setting A regional mental health service with both acute and community facilities.Participants 111 psychiatric patients were transferred from inpatient care to community care from 2012 to 2014. Their HoNOS scores were extracted from a clinical database; Each inpatient-discharge assessment was followed by a community-intake assessment, with the median period between assessments being 4 days (range 0–14). Assessor experience and professional background were recorded.Primary and secondary outcome measures The difference of HoNOS at inpatient-discharge and community-intake were assessed with Pearson correlation, Cohen's κ and effect size.Results Inpatient-discharge HoNOS was on average lower than community-intake HoNOS. The average HoNOS was 8.05 at discharge (median 7, range 1–22), and 12.16 at intake (median 12, range 1–25), an average increase of 4.11 (SD 6.97). Pearson correlation between two total scores was 0.073 (95% CI −0.095 to 0.238) and Cohen's κ was 0.02 (95% CI −0.02 to 0.06). Differences did not appear to depend on assessor experience or professional background.Conclusions Systematic change in the HoNOS occurs at inpatient-to-community transition. Some caution should be exercised in making direct comparisons between inpatient HoNOS and community HoNOS scores. },
        DOI = { 10.1136/bmjopen-2015-010732 },
        FILE = { :luo_harvey_tran_phung_venkatesh_connor_bmj16consistency - Consistency of the Health of the Nation Outcome Scales (HoNOS) at Inpatient to Community Transition.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        PUBLISHER = { British Medical Journal Publishing Group },
        TIMESTAMP = { 2016.05.10 },
        URL = { http://bmjopen.bmj.com/content/6/4/e010732.full },
    }
J
  • A Framework for Classifying Online Mental Health Related Communities with an Interest in Depression
    Saha, Budhaditya, Nguyen, Thin, Phung, Dinh and Venkatesh, Svetha. IEEE Journal of Biomedical and Health Informatics (JBHI), PP(99):1-1, March 2016. [ | | pdf]
    Mental illness has a deep impact on individuals, families, and by extension, society as a whole. Social networks allow individuals with mental disorders to communicate with others sufferers via online communities, providing an invaluable resource for studies on textual signs of psychological health problems. Mental disorders often occur in combinations, e.g., a patient with an anxiety disorder may also develop depression. This co-occurring mental health condition provides the focus for our work on classifying online communities with an interest in depression. For this, we have crawled a large body of 620,000 posts made by 80,000 users in 247 online communities. We have extracted the topics and psycho-linguistic features expressed in the posts, using these as inputs to our model. Following a machine learning technique, we have formulated a joint modelling framework in order to classify mental health-related co-occurring online communities from these features. Finally, we performed empirical validation of the model on the crawled dataset where our model outperforms recent state-of-the-art baselines.
    @ARTICLE { budhaditya_nguyen_phung_venkatesh_bhi16framework,
        AUTHOR = { Saha, Budhaditya and Nguyen, Thin and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { A Framework for Classifying Online Mental Health Related Communities with an Interest in Depression },
        JOURNAL = { IEEE Journal of Biomedical and Health Informatics (JBHI) },
        YEAR = { 2016 },
        VOLUME = { PP },
        NUMBER = { 99 },
        PAGES = { 1-1 },
        MONTH = { March },
        ISSN = { 2168-2194 },
        ABSTRACT = { Mental illness has a deep impact on individuals, families, and by extension, society as a whole. Social networks allow individuals with mental disorders to communicate with others sufferers via online communities, providing an invaluable resource for studies on textual signs of psychological health problems. Mental disorders often occur in combinations, e.g., a patient with an anxiety disorder may also develop depression. This co-occurring mental health condition provides the focus for our work on classifying online communities with an interest in depression. For this, we have crawled a large body of 620,000 posts made by 80,000 users in 247 online communities. We have extracted the topics and psycho-linguistic features expressed in the posts, using these as inputs to our model. Following a machine learning technique, we have formulated a joint modelling framework in order to classify mental health-related co-occurring online communities from these features. Finally, we performed empirical validation of the model on the crawled dataset where our model outperforms recent state-of-the-art baselines. },
        DOI = { 10.1109/JBHI.2016.2543741 },
        FILE = { :budhaditya_nguyen_phung_venkatesh_bhi16framework - A Framework for Classifying Online Mental Health Related Communities with an Interest in Depression.pdf:PDF },
        KEYWORDS = { Blogs;Correlation;Covariance matrices;Feature extraction;Informatics;Media;Pragmatics },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.04.06 },
        URL = { http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=7436759&tag=1 },
    }
J
  • A new transfer learning framework with application to model-agnostic multi-task learning
    Gupta, Sunil, Rana, Santu, Saha, Budhaditya, Phung, Dinh and Venkatesh, Svetha. Knowledge and Information Systems (KAIS), February 2016. [ | | pdf]
    Learning from small number of examples is a challenging problem in machine learning. An effective way to improve the performance is through exploiting knowledge from other related tasks. Multi-task learning (MTL) is one such useful paradigm that aims to improve the performance through jointly modeling multiple related tasks. Although there exist numerous classification or regression models in machine learning literature, most of the MTL models are built around ridge or logistic regression. There exist some limited works, which propose multi-task extension of techniques such as support vector machine, Gaussian processes. However, all these MTL models are tied to specific classification or regression algorithms and there is no single MTL algorithm that can be used at a meta level for any given learning algorithm. Addressing this problem, we propose a generic, model-agnostic joint modeling framework that can take any classification or regression algorithm of a practitioner's choice (standard or custom-built) and build its MTL variant. The key observation that drives our framework is that due to small number of examples, the estimates of task parameters are usually poor, and we show that this leads to an under-estimation of task relatedness between any two tasks with high probability. We derive an algorithm that brings the tasks closer to their true relatedness by improving the estimates of task parameters. This is achieved by appropriate sharing of data across tasks. We provide the detail theoretical underpinning of the algorithm. Through our experiments with both synthetic and real datasets, we demonstrate that the multi-task variants of several classifiers/regressors (logistic regression, support vector machine, K-nearest neighbor, Random Forest, ridge regression, support vector regression) convincingly outperform their single-task counterparts. We also show that the proposed model performs comparable or better than many state-of-the-art MTL and transfer learning baselines.
    @ARTICLE { gupta_rana_budhaditya_phung_venkatesh_kais16newtransfer,
        AUTHOR = { Gupta, Sunil and Rana, Santu and Saha, Budhaditya and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { A new transfer learning framework with application to model-agnostic multi-task learning },
        JOURNAL = { Knowledge and Information Systems (KAIS) },
        YEAR = { 2016 },
        PAGES = { 1--41 },
        MONTH = { February },
        ISSN = { 0219-3116 },
        ABSTRACT = { Learning from small number of examples is a challenging problem in machine learning. An effective way to improve the performance is through exploiting knowledge from other related tasks. Multi-task learning (MTL) is one such useful paradigm that aims to improve the performance through jointly modeling multiple related tasks. Although there exist numerous classification or regression models in machine learning literature, most of the MTL models are built around ridge or logistic regression. There exist some limited works, which propose multi-task extension of techniques such as support vector machine, Gaussian processes. However, all these MTL models are tied to specific classification or regression algorithms and there is no single MTL algorithm that can be used at a meta level for any given learning algorithm. Addressing this problem, we propose a generic, model-agnostic joint modeling framework that can take any classification or regression algorithm of a practitioner's choice (standard or custom-built) and build its MTL variant. The key observation that drives our framework is that due to small number of examples, the estimates of task parameters are usually poor, and we show that this leads to an under-estimation of task relatedness between any two tasks with high probability. We derive an algorithm that brings the tasks closer to their true relatedness by improving the estimates of task parameters. This is achieved by appropriate sharing of data across tasks. We provide the detail theoretical underpinning of the algorithm. Through our experiments with both synthetic and real datasets, we demonstrate that the multi-task variants of several classifiers/regressors (logistic regression, support vector machine, K-nearest neighbor, Random Forest, ridge regression, support vector regression) convincingly outperform their single-task counterparts. We also show that the proposed model performs comparable or better than many state-of-the-art MTL and transfer learning baselines. },
        DOI = { 10.1007/s10115-016-0926-z },
        FILE = { :gupta_rana_budhaditya_phung_venkatesh_kais16newtransfer - A New Transfer Learning Framework with Application to Model Agnostic Multi Task Learning.pdf:PDF },
        KEYWORDS = { Multi-task learning Model-agnostic framework Meta algorithm Classification Regression },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.05.10 },
        URL = { http://dx.doi.org/10.1007/s10115-016-0926-z },
    }
J
  • Multiple Task Transfer Learning with Small Sample Sizes
    Saha, Budhaditya, Gupta, Sunil, Phung, Dinh and Venkatesh, Svetha. Knowledge and Information System (KAIS), 46(2):315-342, Feb. 2016. [ | | pdf]
    Prognosis, such as predicting mortality, is common in medicine. Whenconfronted with small numbers of samples, as in rare medical conditions,the task is challenging. We propose a framework for classificationwith data with small numbers of samples. Conceptually our solutionis a hybrid of multi-task and transfer learning, employing data samplesfrom source tasks as in transfer learning, but considering all taskstogether as in multi-tasklearning. Each task is modelled jointly with other related tasks bydirectly augmenting the data from other tasks. The degree of augmentationdepends on the task relatedness and is estimated directly from thedata. We apply the model on three diverse real-world datasets (healthcaredata, handwritten digit data and face data) and show that our methodoutperforms several state-of-the-art multi-task learning baselines.We extend the model for online multi-task learning where the modelparameters are incrementally updated given new data or new tasks.The novelty of our method lies in offering a hybrid multi-task/transferlearning model to exploit sharing across tasks at the data-leveland joint parameter learning.
    @ARTICLE { budhaditya_gupta_venkatesh_phung_kais16multiple,
        AUTHOR = { Saha, Budhaditya and Gupta, Sunil and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { Multiple Task Transfer Learning with Small Sample Sizes },
        JOURNAL = { Knowledge and Information System (KAIS) },
        YEAR = { 2016 },
        VOLUME = { 46 },
        NUMBER = { 2 },
        PAGES = { 315--342 },
        MONTH = { Feb. },
        ABSTRACT = { Prognosis, such as predicting mortality, is common in medicine. Whenconfronted with small numbers of samples, as in rare medical conditions,the task is challenging. We propose a framework for classificationwith data with small numbers of samples. Conceptually our solutionis a hybrid of multi-task and transfer learning, employing data samplesfrom source tasks as in transfer learning, but considering all taskstogether as in multi-tasklearning. Each task is modelled jointly with other related tasks bydirectly augmenting the data from other tasks. The degree of augmentationdepends on the task relatedness and is estimated directly from thedata. We apply the model on three diverse real-world datasets (healthcaredata, handwritten digit data and face data) and show that our methodoutperforms several state-of-the-art multi-task learning baselines.We extend the model for online multi-task learning where the modelparameters are incrementally updated given new data or new tasks.The novelty of our method lies in offering a hybrid multi-task/transferlearning model to exploit sharing across tasks at the data-leveland joint parameter learning. },
        DOI = { 10.1007/s10115-015-0821-z },
        FILE = { :budhaditya_gupta_venkatesh_phung_kais16multiple - Multiple Task Transfer Learning with Small Sample Sizes.pdf:PDF },
        KEYWORDS = { Multi-task Transfer learning Optimization Healthcare Data mining Statistical analysis },
        OWNER = { dinh },
        TIMESTAMP = { 2015.06.10 },
        URL = { http://link.springer.com/article/10.1007/s10115-015-0821-z },
    }
J
  • Stabilizing L1-norm Prediction Models by Supervised Feature Grouping
    Kamkar, Iman, Gupta, Sunil Kumar, Phung, Dinh and Venkatesh, Svetha. Journal of Biomedical Informatics (JBI), 59(C):149 -168, Feb. 2016. [ | | pdf]
    Emerging Electronic Medical Records (EMRs) have reformed the modern healthcare. These records have great potential to be used for building clinical prediction models. However, a problem in using them is their high dimensionality. Since a lot of information may not be relevant for prediction, the underlying complexity of the prediction models may not be high. A popular way to deal with this problem is to employ feature selection. Lasso and l 1 -norm based feature selection methods have shown promising results. But, in presence of correlated features, these methods select features that change considerably with small changes in data. This prevents clinicians to obtain a stable feature set, which is crucial for clinical decision making. Grouping correlated variables together can improve the stability of feature selection, however, such grouping is usually not known and needs to be estimated for optimal performance. Addressing this problem, we propose a new model that can simultaneously learn the grouping of correlated features and perform stable feature selection. We formulate the model as a constrained optimization problem and provide an efficient solution with guaranteed convergence. Our experiments with both synthetic and real-world datasets show that the proposed model is significantly more stable than Lasso and many existing state-of-the-art shrinkage and classification methods. We further show that in terms of prediction performance, the proposed method consistently outperforms Lasso and other baselines. Our model can be used for selecting stable risk factors for a variety of healthcare problems, so it can assist clinicians toward accurate decision making.
    @ARTICLE { kamkar_gupta_phung_venkatesh_16stabilizing,
        AUTHOR = { Kamkar, Iman and Gupta, Sunil Kumar and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { Stabilizing L1-norm Prediction Models by Supervised Feature Grouping },
        JOURNAL = { Journal of Biomedical Informatics (JBI) },
        YEAR = { 2016 },
        VOLUME = { 59 },
        NUMBER = { C },
        PAGES = { 149 --168 },
        MONTH = { Feb. },
        ISSN = { 1532-0464 },
        ABSTRACT = { Emerging Electronic Medical Records (EMRs) have reformed the modern healthcare. These records have great potential to be used for building clinical prediction models. However, a problem in using them is their high dimensionality. Since a lot of information may not be relevant for prediction, the underlying complexity of the prediction models may not be high. A popular way to deal with this problem is to employ feature selection. Lasso and l 1 -norm based feature selection methods have shown promising results. But, in presence of correlated features, these methods select features that change considerably with small changes in data. This prevents clinicians to obtain a stable feature set, which is crucial for clinical decision making. Grouping correlated variables together can improve the stability of feature selection, however, such grouping is usually not known and needs to be estimated for optimal performance. Addressing this problem, we propose a new model that can simultaneously learn the grouping of correlated features and perform stable feature selection. We formulate the model as a constrained optimization problem and provide an efficient solution with guaranteed convergence. Our experiments with both synthetic and real-world datasets show that the proposed model is significantly more stable than Lasso and many existing state-of-the-art shrinkage and classification methods. We further show that in terms of prediction performance, the proposed method consistently outperforms Lasso and other baselines. Our model can be used for selecting stable risk factors for a variety of healthcare problems, so it can assist clinicians toward accurate decision making. },
        DOI = { http://dx.doi.org/10.1016/j.jbi.2015.11.012 },
        FILE = { :kamkar_gupta_phung_venkatesh_16stabilizing - Stabilizing L1 Norm Prediction Models by Supervised Feature Grouping.pdf:PDF },
        KEYWORDS = { Feature selection, Lasso, Stability, Supervised feature grouping },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.04.06 },
        URL = { http://www.sciencedirect.com/science/article/pii/S1532046415002804 },
    }
J
  • Graph-induced restricted Boltzmann machines for document modeling
    Nguyen, Tu Dinh, Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. Information Sciences, 328(C):60-75, Jan. 2016. [ | | pdf]
    Discovering knowledge from unstructured texts is a central theme in data mining and machine learning. We focus on fast discovery of thematic structures from a corpus. Our approach is based on a versatile probabilistic formulation – the restricted Boltzmann machine (RBM) – where the underlying graphical model is an undirected bipartite graph. Inference is efficient – document representation can be computed with a single matrix projection, making RBMs suitable for massive text corpora available today. Standard RBMs, however, operate on bag-of-words assumption, ignoring the inherent underlying relational structures among words. This results in less coherent word thematic grouping. We introduce graph-based regularization schemes that exploit the linguistic structures, which in turn can be constructed from either corpus statistics or domain knowledge. We demonstrate that the proposed technique improves the group coherence, facilitates visualization, provides means for estimation of intrinsic dimensionality, reduces overfitting, and possibly leads to better classification accuracy.
    @ARTICLE { nguyen_tran_phung_venkatesh_jis16graph,
        AUTHOR = { Nguyen, Tu Dinh and Tran, Truyen and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { Graph-induced restricted {B}oltzmann machines for document modeling },
        JOURNAL = { Information Sciences },
        YEAR = { 2016 },
        VOLUME = { 328 },
        NUMBER = { C },
        PAGES = { 60--75 },
        MONTH = { Jan. },
        ABSTRACT = { Discovering knowledge from unstructured texts is a central theme in data mining and machine learning. We focus on fast discovery of thematic structures from a corpus. Our approach is based on a versatile probabilistic formulation – the restricted Boltzmann machine (RBM) – where the underlying graphical model is an undirected bipartite graph. Inference is efficient – document representation can be computed with a single matrix projection, making RBMs suitable for massive text corpora available today. Standard RBMs, however, operate on bag-of-words assumption, ignoring the inherent underlying relational structures among words. This results in less coherent word thematic grouping. We introduce graph-based regularization schemes that exploit the linguistic structures, which in turn can be constructed from either corpus statistics or domain knowledge. We demonstrate that the proposed technique improves the group coherence, facilitates visualization, provides means for estimation of intrinsic dimensionality, reduces overfitting, and possibly leads to better classification accuracy. },
        DOI = { doi:10.1016/j.ins.2015.08.023 },
        FILE = { :nguyen_tran_phung_venkatesh_jis16graph - Graph Induced Restricted Boltzmann Machines for Document Modeling.pdf:PDF },
        KEYWORDS = { Document modeling, Feature group discovery, Restricted Boltzmann machine, Topic coherence, Word graphs },
        OWNER = { dinh },
        PUBLISHER = { Elsevier },
        TIMESTAMP = { 2015.09.16 },
        URL = { http://dx.doi.org/10.1016/j.ins.2015.08.023 },
    }
J
2015
  • Differentiating sub-groups of online depression-related communities using textual cues
    Nguyen, Thin, O'Dea, Bridianne, Larsen, Mark, Phung, Dinh, Venkatesh, Svetha and Christensen, Helen. In Intl. Conf. on Web Information Systems Engineering (WISE), pages 216-224, Dec. 2015. [ | | pdf]
    Depression is a highly prevalent mental illness and is a comorbidity of other mental and behavioural disorders. The Internet allows individuals who are depressed or caring for those who are depressed, to connect with others via online communities; however, the characteristics of these online conversations and the language styles of those interested in depression have not yet been fully explored. This work aims to explore the textual cues of online communities interested in depression. A random sample of 5,000 blog posts was crawled. Five groupings were identified: depression, bipolar, self-harm, grief, and suicide. Independent variables included psycholinguistic processes and content topics extracted from the posts. Machine learning techniques were used to discriminate messages posted in the depression sub-group from the others. Good predictive validity in depression classification using topics and psycholinguistic clues as features was found. Clear discrimination between writing styles and content, with good predictive power is an important step in understanding social media and its use in mental health.
    @INPROCEEDINGS { nguyen_odea_larsen_phung_venkatesh_christensen_wise15differentiating,
        AUTHOR = { Nguyen, Thin and O'Dea, Bridianne and Larsen, Mark and Phung, Dinh and Venkatesh, Svetha and Christensen, Helen },
        TITLE = { Differentiating sub-groups of online depression-related communities using textual cues },
        BOOKTITLE = { Intl. Conf. on Web Information Systems Engineering (WISE) },
        YEAR = { 2015 },
        VOLUME = { 9419 },
        SERIES = { Lecture Notes in Computer Science },
        PAGES = { 216--224 },
        MONTH = { Dec. },
        PUBLISHER = { Springer },
        ABSTRACT = { Depression is a highly prevalent mental illness and is a comorbidity of other mental and behavioural disorders. The Internet allows individuals who are depressed or caring for those who are depressed, to connect with others via online communities; however, the characteristics of these online conversations and the language styles of those interested in depression have not yet been fully explored. This work aims to explore the textual cues of online communities interested in depression. A random sample of 5,000 blog posts was crawled. Five groupings were identified: depression, bipolar, self-harm, grief, and suicide. Independent variables included psycholinguistic processes and content topics extracted from the posts. Machine learning techniques were used to discriminate messages posted in the depression sub-group from the others. Good predictive validity in depression classification using topics and psycholinguistic clues as features was found. Clear discrimination between writing styles and content, with good predictive power is an important step in understanding social media and its use in mental health. },
        DOI = { 10.1007/978-3-319-26187-4_17 },
        FILE = { :nguyen_odea_larsen_phung_venkatesh_christensen_wise15differentiating - Differentiating Sub Groups of Online Depression Related Communities Using Textual Cues.pdf:PDF },
        ISBN = { 978-3-319-11748-5 },
        KEYWORDS = { Web community; Feature extraction; Textual cues; Online depression },
        LANGUAGE = { English },
        OWNER = { thinng },
        TIMESTAMP = { 2015.09.16 },
        URL = { http://link.springer.com/chapter/10.1007/978-3-319-26187-4_17 },
    }
C
  • Using Twitter to learn about the autism community
    Beykikhoshk, Adham, Arandjelovi{\'c}, Ognjen, Phung, Dinh, Venkatesh, Svetha and Caelli, Terry. IEEE/ACM Intl. Conf. on Advances in Social Network Analysis and Mining (ASONAM), 5(1):1-17, December 2015. [ | | pdf]
    Considering the raising socio-economic burden of autism spectrum disorder (ASD), timely and evidence-driven public policy decision-making and communication of the latest guidelines pertaining to the treatment and management of the disorder is crucial. Yet evidence suggests that policy makers and medical practitioners do not always have a good understanding of the practices and relevant beliefs of ASD-afflicted individuals' carers who often follow questionable recommendations and adopt advice poorly supported by scientific data. The key goal of the present work is to explore the idea that Twitter, as a highly popular platform for information exchange, could be used as a data-mining source to learn about the population affected by ASD---their behaviour, concerns, needs, etc. To this end, using a large data set of over 11 million harvested tweets as the basis for our investigation, we describe a series of experiments which examine a range of linguistic and semantic aspects of messages posted by individuals interested in ASD. Our findings, the first of their nature in the published scientific literature, strongly motivate additional research on this topic and present a methodological basis for further work.
    @ARTICLE { beykikhoshk_arandjelovic_phung_venkatesh_caelli_snaam15using,
        AUTHOR = { Beykikhoshk, Adham and Arandjelovi{\'c}, Ognjen and Phung, Dinh and Venkatesh, Svetha and Caelli, Terry },
        TITLE = { Using {T}witter to learn about the autism community },
        JOURNAL = { IEEE/ACM Intl. Conf. on Advances in Social Network Analysis and Mining (ASONAM) },
        YEAR = { 2015 },
        VOLUME = { 5 },
        NUMBER = { 1 },
        PAGES = { 1--17 },
        MONTH = { December },
        ABSTRACT = { Considering the raising socio-economic burden of autism spectrum disorder (ASD), timely and evidence-driven public policy decision-making and communication of the latest guidelines pertaining to the treatment and management of the disorder is crucial. Yet evidence suggests that policy makers and medical practitioners do not always have a good understanding of the practices and relevant beliefs of ASD-afflicted individuals' carers who often follow questionable recommendations and adopt advice poorly supported by scientific data. The key goal of the present work is to explore the idea that Twitter, as a highly popular platform for information exchange, could be used as a data-mining source to learn about the population affected by ASD---their behaviour, concerns, needs, etc. To this end, using a large data set of over 11 million harvested tweets as the basis for our investigation, we describe a series of experiments which examine a range of linguistic and semantic aspects of messages posted by individuals interested in ASD. Our findings, the first of their nature in the published scientific literature, strongly motivate additional research on this topic and present a methodological basis for further work. },
        DOI = { 10.1007/s13278-015-0261-5 },
        FILE = { :beykikhoshk_arandjelovic_phung_venkatesh_caelli_snaam15using - Using Twitter to Learn about the Autism Community.pdf:PDF },
        KEYWORDS = { Social media Big data Asperger’s Mental health Health care Public health ASD },
        OWNER = { dinh },
        PUBLISHER = { Springer Vienna },
        TIMESTAMP = { 2015.06.10 },
        URL = { http://dx.doi.org/10.1007/s13278-015-0261-5 },
    }
J
  • Learning Entry Profiles of Children with Autism from Multivariate Treatment Information Using Restricted Boltzmann Machines
    Vellanki, Pratibha, Phung, Dinh, Duong, Thi and Venkatesh, Svetha. In Trends and Applications in Knowledge Discovery and Data Mining, pages 245-257, Cham, Nov. 2015. [ | | pdf]
    @INPROCEEDINGS { vellanki_phung_duong_venkatesh_pakdd2015learning,
        AUTHOR = { Vellanki, Pratibha and Phung, Dinh and Duong, Thi and Venkatesh, Svetha },
        TITLE = { Learning Entry Profiles of Children with Autism from Multivariate Treatment Information Using Restricted {B}oltzmann Machines },
        BOOKTITLE = { Trends and Applications in Knowledge Discovery and Data Mining },
        YEAR = { 2015 },
        VOLUME = { 9441 },
        SERIES = { Lecture Notes in Computer Science },
        PAGES = { 245--257 },
        ADDRESS = { Cham },
        MONTH = { Nov. },
        PUBLISHER = { Springer },
        DOI = { 10.1007/978-3-319-25660-3_21 },
        FILE = { :vellanki_phung_duong_venkatesh_pakdd2015learning - Learning Entry Profiles of Children with Autism from Multivariate Treatment Information Using Restricted Boltzmann Machines.pdf:PDF },
        ISBN = { 978-3-319-25660-3 },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.05.21 },
        URL = { http://dx.doi.org/10.1007/978-3-319-25660-3_21 },
    }
C
  • Multi-View Subspace Clustering for Face Images
    Zhang, Xin, Phung, Dinh, Venkatesh, Svetha, Pham, Duc-Son and Liu, Wanquan. In Intl. Conf. on Digital Image Computing: Techniques and Applications (DICTA), pages 1-7, Nov. 2015. [ | | pdf]
    In many real-world computer vision applications, such as multi-camera surveillance, the objects of interest are captured by visual sensors concurrently, resulting in multi-view data. These views usually provide complementary information to each other. One recent and powerful computer vision method for clustering is sparse subspace clustering (SSC); however, it was not designed for multi-view data, which break down its linear separability assumption. To integrate complementary information between views, multi-view clustering algorithms are required to improve the clustering performance. In this paper, we propose a novel multi-view subspace clustering by searching for an unified latent structure as a global affinity matrix in subspace clustering. Due to the integration of affinity matrices for each view, this global affinity matrix can best represent the relationship between clusters. This could help us achieve better performance on face clustering. We derive a provably convergent algorithm based on the alternating direction method of multipliers (ADMM) framework, which is computationally efficient, to solve the formulation. We demonstrate that this formulation outperforms other alternatives based on state-of-the-arts on challenging multi-view face datasets.
    @INPROCEEDINGS { zhang_phung_venkatesh_pham_liu_dicta15multiview,
        AUTHOR = { Zhang, Xin and Phung, Dinh and Venkatesh, Svetha and Pham, Duc-Son and Liu, Wanquan },
        TITLE = { Multi-View Subspace Clustering for Face Images },
        BOOKTITLE = { Intl. Conf. on Digital Image Computing: Techniques and Applications (DICTA) },
        YEAR = { 2015 },
        PAGES = { 1-7 },
        MONTH = { Nov. },
        ABSTRACT = { In many real-world computer vision applications, such as multi-camera surveillance, the objects of interest are captured by visual sensors concurrently, resulting in multi-view data. These views usually provide complementary information to each other. One recent and powerful computer vision method for clustering is sparse subspace clustering (SSC); however, it was not designed for multi-view data, which break down its linear separability assumption. To integrate complementary information between views, multi-view clustering algorithms are required to improve the clustering performance. In this paper, we propose a novel multi-view subspace clustering by searching for an unified latent structure as a global affinity matrix in subspace clustering. Due to the integration of affinity matrices for each view, this global affinity matrix can best represent the relationship between clusters. This could help us achieve better performance on face clustering. We derive a provably convergent algorithm based on the alternating direction method of multipliers (ADMM) framework, which is computationally efficient, to solve the formulation. We demonstrate that this formulation outperforms other alternatives based on state-of-the-arts on challenging multi-view face datasets. },
        DOI = { 10.1109/DICTA.2015.7371289 },
        FILE = { :zhang_phung_venkatesh_pham_liu_dicta15multiview - Multi View Subspace Clustering for Face Images.pdf:PDF },
        KEYWORDS = { computer vision;face recognition;pattern clustering;ADMM framework;SSC;affinity matrices;alternating direction method;computer vision applications;computer vision method;convergent algorithm;face clustering;face images;global affinity matrix;latent structure;linear separability assumption;multicamera surveillance;multipliers;multiview data;multiview face datasets;multiview subspace clustering algorithms;sparse subspace clustering performance;visual sensors;Cameras;Clustering algorithms;Computer vision;Face;Loss measurement;Matrix decomposition;Sparse matrices },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.05.21 },
        URL = { http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7371289 },
    }
C
  • Streaming Variational Inference for Dirichlet Process Mixtures
    Huynh, V., Phung, D. and Venkatesh, S.. In 7th Asian Conference on Machine Learning (ACML), pages 237-252, Nov. 2015. [ | | pdf]
    Bayesian nonparametric models are theoretically suitable to learn streaming data due to their complexity relaxation to the volume of observed data. However, most of the existing variational inference algorithms are not applicable to streaming applications since they re-quire truncation on variational distributions. In this paper, we present two truncation-free variational algorithms, one for mix-membership inference called TFVB (truncation-free variational Bayes), and the other for hard clustering inference called TFME (truncation-free maximization expectation). With these algorithms, we further developed a streaming learning framework for the popular Dirichlet process mixture (DPM) models. Our ex-periments demonstrate the usefulness of our framework in both synthetic and real-world data.
    @INPROCEEDINGS { huynh_phung_venkatesh_15streaming,
        AUTHOR = { Huynh, V. and Phung, D. and Venkatesh, S. },
        TITLE = { Streaming Variational Inference for {D}irichlet {P}rocess {M}ixtures },
        BOOKTITLE = { 7th Asian Conference on Machine Learning (ACML) },
        YEAR = { 2015 },
        PAGES = { 237--252 },
        MONTH = { Nov. },
        ABSTRACT = { Bayesian nonparametric models are theoretically suitable to learn streaming data due to their complexity relaxation to the volume of observed data. However, most of the existing variational inference algorithms are not applicable to streaming applications since they re-quire truncation on variational distributions. In this paper, we present two truncation-free variational algorithms, one for mix-membership inference called TFVB (truncation-free variational Bayes), and the other for hard clustering inference called TFME (truncation-free maximization expectation). With these algorithms, we further developed a streaming learning framework for the popular Dirichlet process mixture (DPM) models. Our ex-periments demonstrate the usefulness of our framework in both synthetic and real-world data. },
        FILE = { :huynh_phung_venkatesh_15streaming - Streaming Variational Inference for Dirichlet Process Mixtures.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.04.06 },
        URL = { http://www.jmlr.org/proceedings/papers/v45/Huynh15.pdf },
    }
C
  • Understanding toxicities and complications of cancer treatment: A data mining approach
    Nguyen, Dang, Luo, Wei, Phung, Dinh and Venkatesh, Svetha. In 28th Australasian Joint Conference on Artificial Intelligence (AI), pages 431-443, Nov 2015. [ | | pdf]
    @INPROCEEDINGS { nguyen_luo_phung_venkatesh_ai15understanding,
        AUTHOR = { Nguyen, Dang and Luo, Wei and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { Understanding toxicities and complications of cancer treatment: A data mining approach },
        BOOKTITLE = { 28th Australasian Joint Conference on Artificial Intelligence (AI) },
        YEAR = { 2015 },
        EDITOR = { Pfahringer, Bernhard and Renz, Jochen },
        VOLUME = { 9457 },
        SERIES = { Lecture Notes in Computer Science },
        PAGES = { 431--443 },
        MONTH = { Nov },
        PUBLISHER = { Springer International Publishing },
        DOI = { 10.1007/978-3-319-26350-2_38 },
        FILE = { :nguyen_luo_phung_venkatesh_ai15understanding - Understanding Toxicities and Complications of Cancer Treatment_ a Data Mining Approach.pdf:PDF },
        LOCATION = { Canberra, ACT, Australia },
        OWNER = { ngdang },
        TIMESTAMP = { 2015.09.15 },
        URL = { http://dx.doi.org/10.1007/978-3-319-26350-2_38 },
    }
C
  • Stable Feature Selection with Support Vector Machines
    Kamkar, Iman, Gupta, Sunil Kumar, Phung, Dinh and Venkatesh, Svetha. In 28th Australasian Joint Conference on Artificial Intelligence (AI), pages 298-308, Cham, Nov. 2015. [ | | pdf]
    The support vector machine (SVM) is a popular method for classification, well known for finding the maximum-margin hyperplane. Combining SVM with l1l1-norm penalty further enables it to simultaneously perform feature selection and margin maximization within a single framework. However, l1l1-norm SVM shows instability in selecting features in presence of correlated features. We propose a new method to increase the stability of l1l1-norm SVM by encouraging similarities between feature weights based on feature correlations, which is captured via a feature covariance matrix. Our proposed method can capture both positive and negative correlations between features. We formulate the model as a convex optimization problem and propose a solution based on alternating minimization. Using both synthetic and real-world datasets, we show that our model achieves better stability and classification accuracy compared to several state-of-the-art regularized classification methods.
    @INPROCEEDINGS { kamkar_gupta_phung_venkatesh_ai15stable,
        AUTHOR = { Kamkar, Iman and Gupta, Sunil Kumar and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { Stable Feature Selection with {S}upport {V}ector {M}achines },
        BOOKTITLE = { 28th Australasian Joint Conference on Artificial Intelligence (AI) },
        YEAR = { 2015 },
        EDITOR = { Pfahringer, Bernhard and Renz, Jochen },
        VOLUME = { 9457 },
        SERIES = { Lecture Notes in Computer Science },
        PAGES = { 298--308 },
        ADDRESS = { Cham },
        MONTH = { Nov. },
        PUBLISHER = { Springer International Publishing },
        ABSTRACT = { The support vector machine (SVM) is a popular method for classification, well known for finding the maximum-margin hyperplane. Combining SVM with l1l1-norm penalty further enables it to simultaneously perform feature selection and margin maximization within a single framework. However, l1l1-norm SVM shows instability in selecting features in presence of correlated features. We propose a new method to increase the stability of l1l1-norm SVM by encouraging similarities between feature weights based on feature correlations, which is captured via a feature covariance matrix. Our proposed method can capture both positive and negative correlations between features. We formulate the model as a convex optimization problem and propose a solution based on alternating minimization. Using both synthetic and real-world datasets, we show that our model achieves better stability and classification accuracy compared to several state-of-the-art regularized classification methods. },
        DOI = { 10.1007/978-3-319-26350-2_26 },
        FILE = { :kamkar_gupta_phung_venkatesh_ai15stable - Stable Feature Selection with Support Vector Machines.pdf:PDF },
        ISBN = { 978-3-319-26350-2 },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.05.21 },
        URL = { http://dx.doi.org/10.1007/978-3-319-26350-2_26 },
    }
C
  • Exploiting Feature Relationships Towards Stable Feature Selection
    Kamkar, Iman, Gupta, Sunil, Phung, Dinh and Venkatesh, Svetha. In Intl. Conf. on Data Science and Advanced Analytics (DSAA), pages 1-10, Paris, France, Oct. 2015. [ | | pdf]
    Feature selection is an important step in building predictive models for most real-world problems. One of the popular methods in feature selection is Lasso. However, it shows instability in selecting features when dealing with correlated features. In this work, we propose a new method that aims to increase the stability of Lasso by encouraging similarities between features based on their relatedness, which is captured via a feature covariance matrix. Besides modeling positive feature correlations, our method can also identify negative correlations between features. We propose a convex formulation for our model along with an alternating optimization algorithm that can learn the weights of the features as well as the relationship between them. Using both synthetic and real-world data, we show that the proposed method is more stable than Lasso and many state-of-the-art shrinkage and feature selection methods. Also, its predictive performance is comparable to other methods.
    @INPROCEEDINGS { kamkar_gupta_phung_venkatesh_dsaa15,
        AUTHOR = { Kamkar, Iman and Gupta, Sunil and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { Exploiting Feature Relationships Towards Stable Feature Selection },
        BOOKTITLE = { Intl. Conf. on Data Science and Advanced Analytics (DSAA) },
        YEAR = { 2015 },
        PAGES = { 1--10 },
        ADDRESS = { Paris, France },
        MONTH = { Oct. },
        ABSTRACT = { Feature selection is an important step in building predictive models for most real-world problems. One of the popular methods in feature selection is Lasso. However, it shows instability in selecting features when dealing with correlated features. In this work, we propose a new method that aims to increase the stability of Lasso by encouraging similarities between features based on their relatedness, which is captured vi