Note*:This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder. Please also observe the IEEE Copyright, ACM Copyright and Springer Copyright Notices.

2020
  • OTLDA: A Geometry-aware Optimal Transport Approach for Topic Modeling
    Viet Huynh, Ethan Zhao and Dinh Phung. In In Proc. of the thirty-fourth Conference on Neural Information Processing Systems (NeurIPS), dec 2020. [ | | pdf]
    We present an optimal transport framework for learning topics from textual data. While the celebrated Latent Dirichlet allocation (LDA) topic model and its variants have been applied to many disciplines, they mainly focus on word-occurrences and neglect to incorporate semantic regularities in language. Even though recent works have tried to exploit the semantic relationship between words to bridge this gap, however, these models which are usually extensions of LDA or Dirichlet Multinomial mixture (DMM) are tailored to deal effectively with either regular or short documents. The optimal transport distance provides an appealing tool to incorporate the geometry of word semantics into it. Moreover, recent developments on efficient computation of optimal transport distance also promote its application in topic modeling. In this paper we ground on optimal transport theory to naturally exploit the geometric structures of semantically related words in embedding spaces which leads to more interpretable learned topics. Comprehensive experiments illustrate that the proposed framework outperforms competitive approaches in terms of topic coherence on assorted text corpora which include both long and short documents. The representation of learned topic also leads to better accuracy on classification downstream tasks, which is considered as an extrinsic evaluation.
    @INPROCEEDINGS { huynh_etal_nisp20_OTLDA,
        AUTHOR = { Viet Huynh and Ethan Zhao and Dinh Phung },
        BOOKTITLE = { In Proc. of the thirty-fourth Conference on Neural Information Processing Systems (NeurIPS) },
        TITLE = { {OTLDA}: A Geometry-aware Optimal Transport Approach for Topic Modeling },
        YEAR = { 2020 },
        MONTH = { dec },
        ABSTRACT = { We present an optimal transport framework for learning topics from textual data. While the celebrated Latent Dirichlet allocation (LDA) topic model and its variants have been applied to many disciplines, they mainly focus on word-occurrences and neglect to incorporate semantic regularities in language. Even though recent works have tried to exploit the semantic relationship between words to bridge this gap, however, these models which are usually extensions of LDA or Dirichlet Multinomial mixture (DMM) are tailored to deal effectively with either regular or short documents. The optimal transport distance provides an appealing tool to incorporate the geometry of word semantics into it. Moreover, recent developments on efficient computation of optimal transport distance also promote its application in topic modeling. In this paper we ground on optimal transport theory to naturally exploit the geometric structures of semantically related words in embedding spaces which leads to more interpretable learned topics. Comprehensive experiments illustrate that the proposed framework outperforms competitive approaches in terms of topic coherence on assorted text corpora which include both long and short documents. The representation of learned topic also leads to better accuracy on classification downstream tasks, which is considered as an extrinsic evaluation. },
        FILE = { :huynh_etal_nisp20_OTLDA - OTLDA_ a Geometry Aware Optimal Transport Approach for Topic Modeling.pdf:PDF },
        URL = { https://proceedings.neurips.cc/paper/2020/hash/d800149d2f947ad4d64f34668f8b20f6-Abstract.html },
    }
C
  • A Self-Attention Network based Node Embedding Model
    Dai Quoc Nguyen, Tu Dinh Nguyen and Dinh Phung. In Proc. of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD), 2020. [ | ]
    Despite several progresses have been made recently, limited research has been conducted for inductive setting where embeddings are required for newly unseen nodes – a setting encountered commonly in practical applications of deep learning for graph networks. This significantly affects the performances of downstream tasks such as node classification, link prediction or community extraction. To this end, we propose SANNE – a novel unsupervised embedding model – whose central idea is to employ a self-attention mechanism followed by a feed-forward network, in order to iteratively aggregate vector representations of nodes in sampled random walks. As a consequence, SANNE can produce plausible embeddings not only for present nodes, but also for newly unseen nodes. Experimental results show that our unsupervised SANNE obtains state-of-the-art results for the node classification task on benchmark datasets.
    @INPROCEEDINGS { nguyen_etal_ecml20_selfattention,
        AUTHOR = { Dai Quoc Nguyen and Tu Dinh Nguyen and Dinh Phung },
        BOOKTITLE = { Proc. of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD) },
        TITLE = { A Self-Attention Network based Node Embedding Model },
        YEAR = { 2020 },
        ABSTRACT = { Despite several progresses have been made recently, limited research has been conducted for inductive setting where embeddings are required for newly unseen nodes – a setting encountered commonly in practical applications of deep learning for graph networks. This significantly affects the performances of downstream tasks such as node classification, link prediction or community extraction. To this end, we propose SANNE – a novel unsupervised embedding model – whose central idea is to employ a self-attention mechanism followed by a feed-forward network, in order to iteratively aggregate vector representations of nodes in sampled random walks. As a consequence, SANNE can produce plausible embeddings not only for present nodes, but also for newly unseen nodes. Experimental results show that our unsupervised SANNE obtains state-of-the-art results for the node classification task on benchmark datasets. },
    }
C
  • Parameterized Rate-Distortion Stochastic Encoder
    Quan Hoang, Trung Le and Dinh Phung. In Proc. of the 37th International Conference on Machine Learning (ICML), 2020. [ | ]
    @INPROCEEDINGS { hoang_etal_icml20_parameterized,
        AUTHOR = { Quan Hoang and Trung Le and Dinh Phung },
        BOOKTITLE = { Proc. of the 37th International Conference on Machine Learning (ICML) },
        TITLE = { Parameterized Rate-Distortion Stochastic Encoder },
        YEAR = { 2020 },
    }
C
  • Deep Generative Models of Sparse and Overdispersed Discrete Data
    He Zhao, Piyush Rai, Lan Du, Wray Buntine, Dinh Phung and Mingyuan Zhou. In Proc of the 23rd Int. Conf. on Artificial Intelligence and Statistics (AISTATS), 2020. [ | | pdf]
    In this paper, we propose a variational autoencoder based framework that generates discrete data, including both count-valued and binary data, via negativebinomial distribution. We also examine the model’s ability to capture self- and cross-excitations in discrete data, which are critical for modelling overdispersion. We conduct extensive experiments on text analysis and collaborative filtering. Compared with several state-of-the-art baselines, the proposed models achieve significantly better performance on the above problems. By achieving superior modelling performance with a simple yet effect Bayesian extension to VAEs, we demonstrate that it is feasible to adapt the knowledge and experience of Bayesian probabilistic matrix factorisation into newly-developed deep generative models.
    @INPROCEEDINGS { zhao_etal_aistats20_deepgenerative,
        AUTHOR = { He Zhao and Piyush Rai and Lan Du and Wray Buntine and Dinh Phung and Mingyuan Zhou },
        TITLE = { Deep Generative Models of Sparse and Overdispersed Discrete Data },
        BOOKTITLE = { Proc of the 23rd Int. Conf. on Artificial Intelligence and Statistics (AISTATS) },
        YEAR = { 2020 },
        ABSTRACT = { In this paper, we propose a variational autoencoder based framework that generates discrete data, including both count-valued and binary data, via negativebinomial distribution. We also examine the model’s ability to capture self- and cross-excitations in discrete data, which are critical for modelling overdispersion. We conduct extensive experiments on text analysis and collaborative filtering. Compared with several state-of-the-art baselines, the proposed models achieve significantly better performance on the above problems. By achieving superior modelling performance with a simple yet effect Bayesian extension to VAEs, we demonstrate that it is feasible to adapt the knowledge and experience of Bayesian probabilistic matrix factorisation into newly-developed deep generative models. },
        FILE = { :zhao_etal_aistats20_deepgenerative - Deep Generative Models of Sparse and Overdispersed Discrete Data.pdf:PDF },
        URL = { https://www.semanticscholar.org/paper/Deep-Generative-Models-of-Sparse-and-Overdispersed-Zhao-Rai/8136c46488875b09e15e89c08bf02698901322a1 },
    }
C
  • A Relational Memory-based Embedding Model for Triple Classification and Search Personalization
    Dai Quoc Nguyen, Tu Dinh Nguyen and Dinh Phung. In Proc. of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), 2020. [ | | pdf]
    Knowledge graph embedding methods often suffer from a limitation of memorizing valid triples to predict new ones for triple classification and search personalization problems. To this end, we introduce a novel embedding model, named R-MeN, that explores a relational memory network to encode potential dependencies in relationship triples. R-MeN considers each triple as a sequence of 3 input vectors that recurrently interact with a memory using a transformer self-attention mechanism. Thus R-MeN encodes new information from interactions between the memory and each input vector to return a corresponding vector. Consequently, R-MeN feeds these 3 returned vectors to a convolutional neural network-based decoder to produce a scalar score for the triple. Experimental results show that our proposed R-MeN obtains state-of-the-art results on SEARCH17 for the search personalization task, and on WN11 and FB13 for the triple classification task.
    @INPROCEEDINGS { nguyen_etal_acl9_relational,
        AUTHOR = { Dai Quoc Nguyen and Tu Dinh Nguyen and Dinh Phung },
        BOOKTITLE = { Proc. of the 58th Annual Meeting of the Association for Computational Linguistics (ACL) },
        TITLE = { A Relational Memory-based Embedding Model for Triple Classification and Search Personalization },
        YEAR = { 2020 },
        ABSTRACT = { Knowledge graph embedding methods often suffer from a limitation of memorizing valid triples to predict new ones for triple classification and search personalization problems. To this end, we introduce a novel embedding model, named R-MeN, that explores a relational memory network to encode potential dependencies in relationship triples. R-MeN considers each triple as a sequence of 3 input vectors that recurrently interact with a memory using a transformer self-attention mechanism. Thus R-MeN encodes new information from interactions between the memory and each input vector to return a corresponding vector. Consequently, R-MeN feeds these 3 returned vectors to a convolutional neural network-based decoder to produce a scalar score for the triple. Experimental results show that our proposed R-MeN obtains state-of-the-art results on SEARCH17 for the search personalization task, and on WN11 and FB13 for the triple classification task. },
        FILE = { :nguyen_etal_acl9_relational - A Relational Memory Based Embedding Model for Triple Classification and Search Personalization.PDF:PDF },
        URL = { https://arxiv.org/abs/1907.06080 },
    }
C
  • Stein variational gradient descent with variance reduction
    Nhan Dam, Trung Le, Viet Huynh and Dinh Phung. In Proc. of the 2020 Int. Joint Conference on Neural Networks (IJCNN), jul 2020. [ | ]
    Probabilistic inference is a common and important task in statistical machine learning. The recently proposed Stein variational gradient descent (SVGD) is a generic Bayesian inference method that has been shown to be successfully applied in a wide range of contexts, especially in dealing with large datasets, where existing probabilistic inference methods have been known to be ineffective. In a large-scale data setting, SVGD employs the mini-batch strategy but its mini-batch estimator has large variance, hence compromising its estimation quality in practice. To this end, we propose in this paper a generic SVGD-based inference method that can significantly reduce the variance of mini-batch estimator when working with large datasets. Our experiments on 14 datasets show that the proposed method enjoys substantial and consistent improvements compared with baseline methods in binary classification task and its pseudo-online learning setting, and regression task. Furthermore, our framework is generic and applicable to a wide range of probabilistic inference problems such as in Bayesian neural networks and Markov random fields.
    @INPROCEEDINGS { dam_etal_ijcnn20_steinvariational,
        AUTHOR = { Nhan Dam and Trung Le and Viet Huynh and Dinh Phung },
        BOOKTITLE = { Proc. of the 2020 Int. Joint Conference on Neural Networks (IJCNN) },
        TITLE = { Stein variational gradient descent with variance reduction },
        YEAR = { 2020 },
        MONTH = { jul },
        ABSTRACT = { Probabilistic inference is a common and important task in statistical machine learning. The recently proposed Stein variational gradient descent (SVGD) is a generic Bayesian inference method that has been shown to be successfully applied in a wide range of contexts, especially in dealing with large datasets, where existing probabilistic inference methods have been known to be ineffective. In a large-scale data setting, SVGD employs the mini-batch strategy but its mini-batch estimator has large variance, hence compromising its estimation quality in practice. To this end, we propose in this paper a generic SVGD-based inference method that can significantly reduce the variance of mini-batch estimator when working with large datasets. Our experiments on 14 datasets show that the proposed method enjoys substantial and consistent improvements compared with baseline methods in binary classification task and its pseudo-online learning setting, and regression task. Furthermore, our framework is generic and applicable to a wide range of probabilistic inference problems such as in Bayesian neural networks and Markov random fields. },
        FILE = { :dam_etal_ijcnn20_steinvariational - Stein Variational Gradient Descent with Variance Reduction.pdf:PDF },
    }
C
  • OptiGAN: Generative Adversarial Networks for Goal Optimized Sequence Generation
    Mahmoud Hossam, Trung Le, Viet Huynh, Michael Papasimeon and Dinh Phung. In Proc. of the International Joint Conference on Neural Networks (IJCNN), 2020. [ | ]
    One of the challenging problems in sequence generation tasks is the optimized generation of sequences with specific desired goals. Existing sequential generative models mainly generate sequences to closely mimic the training data, without direct optimization according to desired goals or properties specific to the task. In this paper, we propose OptiGAN, a generative GAN-based model that incorporates both Generative Adversarial Networks and Reinforcement Learning (RL) to optimize desired goal scores using policy gradients. We apply our model to text and sequence generation, where our model is able to achieve higher scores out-performing selected GAN and RL baselines, while not sacrificing output sample diversity.
    @INPROCEEDINGS { hossam_etal_ijcnn20_OptiGAN,
        AUTHOR = { Mahmoud Hossam and Trung Le and Viet Huynh and Michael Papasimeon and Dinh Phung },
        BOOKTITLE = { Proc. of the International Joint Conference on Neural Networks (IJCNN) },
        TITLE = { OptiGAN: Generative Adversarial Networks for Goal Optimized Sequence Generation },
        YEAR = { 2020 },
        ABSTRACT = { One of the challenging problems in sequence generation tasks is the optimized generation of sequences with specific desired goals. Existing sequential generative models mainly generate sequences to closely mimic the training data, without direct optimization according to desired goals or properties specific to the task. In this paper, we propose OptiGAN, a generative GAN-based model that incorporates both Generative Adversarial Networks and Reinforcement Learning (RL) to optimize desired goal scores using policy gradients. We apply our model to text and sequence generation, where our model is able to achieve higher scores out-performing selected GAN and RL baselines, while not sacrificing output sample diversity. },
        FILE = { :hossam_etal_ijcnn20_OptiGAN - OptiGAN_ Generative Adversarial Networks for Goal Optimized Sequence Generation.pdf:PDF },
    }
C
  • Code Pointer Network for Binary Function Scope Identification
    Van Nguyen, Trung Le, Tue Le, Khanh Nguyen, Olivier de Vel, Paul Montague and Dinh Phung. In Proc. of the International Joint Conference on Neural Networks (IJCNN), 2020. [ | ]
    Function identification is a preliminary step in binary analysis for many extensive applications from malware detection, common vulnerability detection and binary instrumentation to name a few. In this paper, we propose the Code Pointer Network that leverages the underlying idea of a pointer network to efficiently and effectively tackle function scope identification – the hardest and most crucial task in function identification. We establish extensive experiments to compare our proposed method with the deep learning based baseline. Experimental results demonstrate that our proposed method significantly outperforms the state-of-the-art baseline in terms of both predictive performance and running time.
    @INPROCEEDINGS { nguyen_etal_ijcnn20_codepointer,
        AUTHOR = { Van Nguyen and Trung Le and Tue Le and Khanh Nguyen and Olivier de Vel and Paul Montague and Dinh Phung },
        BOOKTITLE = { Proc. of the International Joint Conference on Neural Networks (IJCNN) },
        TITLE = { Code Pointer Network for Binary Function Scope Identification },
        YEAR = { 2020 },
        ABSTRACT = { Function identification is a preliminary step in binary analysis for many extensive applications from malware detection, common vulnerability detection and binary instrumentation to name a few. In this paper, we propose the Code Pointer Network that leverages the underlying idea of a pointer network to efficiently and effectively tackle function scope identification – the hardest and most crucial task in function identification. We establish extensive experiments to compare our proposed method with the deep learning based baseline. Experimental results demonstrate that our proposed method significantly outperforms the state-of-the-art baseline in terms of both predictive performance and running time. },
        FILE = { :nguyen_etal_ijcnn20_codepointer - Code Pointer Network for Binary Function Scope Identification.pdf:PDF },
    }
C
  • Dual-Component Deep Domain Adaptation: A New Approach for Cross Project Software Vulnerability Detection
    Van Nguyen, Trung Le, Olivier de Vel, Paul Montague, John Grundy and Dinh Phung. In Proc. of the 24th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2020. [ | ]
    Owing to the ubiquity of computer software, software vulnerability detection (SVD) has become an important problem in the software industry and computer security. One of the most crucial issues in SVD is coping with the scarcity of labeled vulnerabilities in projects that require the laborious manual labeling of code by software security experts. One possible solution is to employ deep domain adaptation (DA) which has recently witnessed enormous success in transferring learning from structural labeled to unlabeled data sources. Generative adversarial network (GAN) is a technique that attempts to bridge the gap between source and target data in the joint space and emerges as a building block to develop deep DA approaches with state-of-the-art performance. However, deep DA approaches using the GAN principle to close the gap are subject to the mode collapsing problem that negatively impacts the predictive performance. Our aim in this paper is to propose Dual Generator-Discriminator Deep Code Domain Adaptation Network (Dual-GD-DDAN) for tackling the problem of transfer learning from labeled to unlabeled software projects in SVD to resolve the mode collapsing problem faced in previous approaches. The experimental results on real-world software projects show that our method outperforms state-of-the art baselines by a wide margin.
    @INPROCEEDINGS { nguyen_etal_pakdd20_dualcomponent,
        AUTHOR = { Van Nguyen and Trung Le and Olivier de Vel and Paul Montague and John Grundy and Dinh Phung },
        BOOKTITLE = { Proc. of the 24th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) },
        TITLE = { Dual-Component Deep Domain Adaptation: A New Approach for Cross Project Software Vulnerability Detection },
        YEAR = { 2020 },
        ABSTRACT = { Owing to the ubiquity of computer software, software vulnerability detection (SVD) has become an important problem in the software industry and computer security. One of the most crucial issues in SVD is coping with the scarcity of labeled vulnerabilities in projects that require the laborious manual labeling of code by software security experts. One possible solution is to employ deep domain adaptation (DA) which has recently witnessed enormous success in transferring learning from structural labeled to unlabeled data sources. Generative adversarial network (GAN) is a technique that attempts to bridge the gap between source and target data in the joint space and emerges as a building block to develop deep DA approaches with state-of-the-art performance. However, deep DA approaches using the GAN principle to close the gap are subject to the mode collapsing problem that negatively impacts the predictive performance. Our aim in this paper is to propose Dual Generator-Discriminator Deep Code Domain Adaptation Network (Dual-GD-DDAN) for tackling the problem of transfer learning from labeled to unlabeled software projects in SVD to resolve the mode collapsing problem faced in previous approaches. The experimental results on real-world software projects show that our method outperforms state-of-the art baselines by a wide margin. },
        FILE = { :nguyen_etal_pakdd20_dualcomponent - Dual Component Deep Domain Adaptation_ a New Approach for Cross Project Software Vulnerability Detection.pdf:PDF },
    }
C
  • Code Action Network for Binary Function Scope Identification
    Van Nguyen, Trung Le, Tue Le, Khanh Nguyen, Olivier de Vel, Paul Montague, John Grundy and Dinh Phung. In Proc. of the 24th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2020. [ | ]
    Function identification is a preliminary step in binary analysis for many applications from malware detection, common vulnerability detection and binary instrumentation to name a few. In this paper, we propose the Code Action Network (CAN) whose key idea is to encode the task of function scope identification to a sequence of three action states NI (i.e., next inclusion), NE (i.e., next exclusion), and FE (i.e., function end) to efficiently and effectively tackle function scope identification, the hardest and most crucial task in function identification. A bidirectional Recurrent Neural Network is trained to match binary programs with their sequence of action states. To work out function scopes in a binary, this binary is first fed to a trained CAN to output its sequence of action states which can be further decoded to know the function scopes in the binary. We undertake extensive experiments to compare our proposed method with other stateof-the-art baselines. Experimental results demonstrate that our proposed method outperforms the state-of-the-art baselines in terms of predictive performance on real-world datasets which include binaries from well-known libraries.
    @INPROCEEDINGS { nguyen_etal_pakdd20_codeaction,
        AUTHOR = { Van Nguyen and Trung Le and Tue Le and Khanh Nguyen and Olivier de Vel and Paul Montague and John Grundy and Dinh Phung },
        BOOKTITLE = { Proc. of the 24th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) },
        TITLE = { Code Action Network for Binary Function Scope Identification },
        YEAR = { 2020 },
        ABSTRACT = { Function identification is a preliminary step in binary analysis for many applications from malware detection, common vulnerability detection and binary instrumentation to name a few. In this paper, we propose the Code Action Network (CAN) whose key idea is to encode the task of function scope identification to a sequence of three action states NI (i.e., next inclusion), NE (i.e., next exclusion), and FE (i.e., function end) to efficiently and effectively tackle function scope identification, the hardest and most crucial task in function identification. A bidirectional Recurrent Neural Network is trained to match binary programs with their sequence of action states. To work out function scopes in a binary, this binary is first fed to a trained CAN to output its sequence of action states which can be further decoded to know the function scopes in the binary. We undertake extensive experiments to compare our proposed method with other stateof-the-art baselines. Experimental results demonstrate that our proposed method outperforms the state-of-the-art baselines in terms of predictive performance on real-world datasets which include binaries from well-known libraries. },
        FILE = { :nguyen_etal_pakdd20_codeaction - Code Action Network for Binary Function Scope Identification.pdf:PDF },
    }
C
  • Deep Cost-sensitive Kernel Machine for Binary Software Vulnerability Detection
    Tuan Nguyen, Trung Le, Khanh Nguyen, Olivier de Vel, Paul Montague, John C Grundy and and Dinh Phung. In Proc. of the 24th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2020. [ | ]
    Owing to the sharp rise in the severity of the threats imposed by software vulnerabilities, software vulnerability detection has become an important concern in the software industry, such as the embedded systems industry, and in the field of computer security. Software vulnerability detection can be carried out at the source code or binary level. However, the latter is more impactful and practical since when using commercial software, we usually only possess binary software. In this paper, we leverage deep learning and kernel methods to propose the Deep Cost-sensitive Kernel Machine, a method that inherits the advantages of deep learning methods in efficiently tackling structural data and kernel methods in learning the characteristic of vulnerable binary examples with high generalization capacity. We conduct experiments on two real-world binary datasets. The experimental results have shown a convincing outperformance of our proposed method over the baselines.
    @INPROCEEDINGS { nguyen_etal_pakdd20_deepcost,
        AUTHOR = { Tuan Nguyen and Trung Le and Khanh Nguyen and Olivier de Vel and Paul Montague and John C Grundy and and Dinh Phung },
        BOOKTITLE = { Proc. of the 24th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) },
        TITLE = { Deep Cost-sensitive Kernel Machine for Binary Software Vulnerability Detection },
        YEAR = { 2020 },
        ABSTRACT = { Owing to the sharp rise in the severity of the threats imposed by software vulnerabilities, software vulnerability detection has become an important concern in the software industry, such as the embedded systems industry, and in the field of computer security. Software vulnerability detection can be carried out at the source code or binary level. However, the latter is more impactful and practical since when using commercial software, we usually only possess binary software. In this paper, we leverage deep learning and kernel methods to propose the Deep Cost-sensitive Kernel Machine, a method that inherits the advantages of deep learning methods in efficiently tackling structural data and kernel methods in learning the characteristic of vulnerable binary examples with high generalization capacity. We conduct experiments on two real-world binary datasets. The experimental results have shown a convincing outperformance of our proposed method over the baselines. },
        FILE = { :nguyen_etal_pakdd20_deepcost - Deep Cost Sensitive Kernel Machine for Binary Software Vulnerability Detection.pdf:PDF },
    }
C
2019
  • A Bayesian Extension to VAEs for Discrete Data
    He Zhao, Piyush Rai, Lan Du, Wray Buntine, Dinh Phung and Mingyuan Zhou. In In Proc. of Bayesian Deep Learning (NeurIPS 2019 Workshop), dec 2019. [ | ]
    @INPROCEEDINGS { zhao_etal_bdl19_bayesianextension,
        AUTHOR = { He Zhao and Piyush Rai and Lan Du and Wray Buntine and Dinh Phung and Mingyuan Zhou },
        TITLE = { A Bayesian Extension to {VAE}s for Discrete Data },
        BOOKTITLE = { In Proc. of Bayesian Deep Learning (NeurIPS 2019 Workshop) },
        YEAR = { 2019 },
        MONTH = { dec },
    }
C
  • Pair-based Uncertainty and Diversity Promoting Early Active Learning for Person Re-identification
    Wenhe Liu, Xiaojun Chang, Ling Chen, Dinh Phung, Yang Yi and Hauptmann Alexander. Transactions on Intelligent Systems and Technology, 2019. [ | ]
    @ARTICLE { liu_etal_tist19_pairbased,
        AUTHOR = { Wenhe Liu and Xiaojun Chang and Ling Chen and Dinh Phung and Yang Yi and Hauptmann Alexander },
        TITLE = { Pair-based Uncertainty and Diversity Promoting Early Active Learning for Person Re-identification },
        JOURNAL = { Transactions on Intelligent Systems and Technology },
        YEAR = { 2019 },
    }
J
  • An effective spatial-temporal attention based neural network for traffic flow prediction
    Loan N.N. Do, Hai L. Vu, Bao Q. Vo, Zhiyuan Liu and Dinh Phung. Transportation Research Part C: Emerging Technologies, 108:12 - 28, 2019. [ | | pdf]
    Due to its importance in Intelligent Transport Systems (ITS), traffic flow prediction has been the focus of many studies in the last few decades. Existing traffic flow prediction models mainly extract static spatial-temporal correlations, although these correlations are known to be dynamic in traffic networks. Attention-based models have emerged in recent years, mostly in the field of natural language processing, and have resulted in major progresses in terms of both accuracy and interpretability. This inspires us to introduce the application of attentions for traffic flow prediction. In this study, a deep learning based traffic flow predictor with spatial and temporal attentions (STANN) is proposed. The spatial and temporal attentions are used to exploit the spatial dependencies between road segments and temporal dependencies between time steps respectively. Experiment results with a real-world traffic dataset demonstrate the superior performance of the proposed model. The results also show that the utilization of multiple data resolutions could help improve prediction accuracy. Furthermore, the proposed model is demonstrated to have potential for improving the understanding of spatial-temporal correlations in a traffic network.
    @ARTICLE { do_etal_trc19_AnEffective,
        AUTHOR = { Loan N.N. Do and Hai L. Vu and Bao Q. Vo and Zhiyuan Liu and Dinh Phung },
        TITLE = { An effective spatial-temporal attention based neural network for traffic flow prediction },
        JOURNAL = { Transportation Research Part C: Emerging Technologies },
        YEAR = { 2019 },
        VOLUME = { 108 },
        PAGES = { 12 - 28 },
        ISSN = { 0968-090X },
        ABSTRACT = { Due to its importance in Intelligent Transport Systems (ITS), traffic flow prediction has been the focus of many studies in the last few decades. Existing traffic flow prediction models mainly extract static spatial-temporal correlations, although these correlations are known to be dynamic in traffic networks. Attention-based models have emerged in recent years, mostly in the field of natural language processing, and have resulted in major progresses in terms of both accuracy and interpretability. This inspires us to introduce the application of attentions for traffic flow prediction. In this study, a deep learning based traffic flow predictor with spatial and temporal attentions (STANN) is proposed. The spatial and temporal attentions are used to exploit the spatial dependencies between road segments and temporal dependencies between time steps respectively. Experiment results with a real-world traffic dataset demonstrate the superior performance of the proposed model. The results also show that the utilization of multiple data resolutions could help improve prediction accuracy. Furthermore, the proposed model is demonstrated to have potential for improving the understanding of spatial-temporal correlations in a traffic network. },
        DOI = { https://doi.org/10.1016/j.trc.2019.09.008 },
        FILE = { :do_etal_trc19_AnEffective - An Effective Spatial Temporal Attention Based Neural Network for Traffic Flow Prediction.pdf:PDF },
        KEYWORDS = { Traffic flow prediction, Traffic flow forecasting, Deep learning, Neural network, Attention },
        URL = { http://www.sciencedirect.com/science/article/pii/S0968090X19301330 },
    }
J
  • Learning Generative Adversarial Networks from Multiple Data Sources
    Trung Le, Quan Hoang, Hung Vu, Tu Dinh Nguyen, Hung Bui and Dinh Phung. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI), pages 2823-2829, July 2019. [ | | pdf]
    Generative Adversarial Networks (GANs) are a powerful class of deep generative models. In this paper, we extend GAN to the problem of generating data that are not only close to a primary data source but also required to be different from auxiliary data sources. For this problem, we enrich both GANs’ formulations and applications by introducing pushing forces that thrust generated samples away from given auxiliary data sources. We term our method Push-and-Pull GAN (P2GAN). We conduct extensive experiments to demonstratethe merit of P2GAN in two applications: generating data with constraints and addressing the mode collapsing problem. We use CIFAR-10, STL-10, and ImageNet datasets and compute Fréchet Inception Distance to evaluate P2GAN’s effectiveness in addressing the mode collapsing problem. The results show that P2GAN outperforms the state-of-the-art baselines. For the problem of generating data with constraints, we show that P2GAN can successfully avoid generating specific features such as black hair.
    @INPROCEEDINGS { le_etal_ijcai19_learningGAN,
        AUTHOR = { Trung Le and Quan Hoang and Hung Vu and Tu Dinh Nguyen and Hung Bui and Dinh Phung },
        TITLE = { Learning Generative Adversarial Networks from Multiple Data Sources },
        BOOKTITLE = { Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI) },
        YEAR = { 2019 },
        PAGES = { 2823--2829 },
        MONTH = { July },
        PUBLISHER = { International Joint Conferences on Artificial Intelligence Organization },
        ABSTRACT = { Generative Adversarial Networks (GANs) are a powerful class of deep generative models. In this paper, we extend GAN to the problem of generating data that are not only close to a primary data source but also required to be different from auxiliary data sources. For this problem, we enrich both GANs’ formulations and applications by introducing pushing forces that thrust generated samples away from given auxiliary data sources. We term our method Push-and-Pull GAN (P2GAN). We conduct extensive experiments to demonstratethe merit of P2GAN in two applications: generating data with constraints and addressing the mode collapsing problem. We use CIFAR-10, STL-10, and ImageNet datasets and compute Fréchet Inception Distance to evaluate P2GAN’s effectiveness in addressing the mode collapsing problem. The results show that P2GAN outperforms the state-of-the-art baselines. For the problem of generating data with constraints, we show that P2GAN can successfully avoid generating specific features such as black hair. },
        FILE = { :le_etal_ijcai19_learningGAN - Learning Generative Adversarial Networks from Multiple Data Sources.pdf:PDF },
        URL = { https://www.ijcai.org/Proceedings/2019/391 },
    }
C
  • Three-Player Wasserstein GAN via Amortised Duality
    Nhan Dam, Quan Hoang, Trung Le, Tu Dinh Nguyen, Hung Bui and Dinh Phung. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, (IJCAI), pages 2202-2208, July 2019. [ | | pdf]
    We propose a new formulation for learning generative adversarial networks (GANs) using optimal transport cost (the general form of Wasserstein distance) as the objective criterion to measure the dissimilarity between target distribution and learned distribution. Our formulation is based on the general form of the Kantorovich duality which is applicable to optimal transport with a wide range of cost functions that are not necessarily a metric. To make optimising this duality form amenable to gradient-based methods, we employ a function that acts as an amortised optimiser for the innermost optimisation problem. Interestingly, the amortised optimiser can be viewed as a mover since it strategically shifts around data points. The resulting formulation is a sequential min-max-min game with 3 players: the generator, the critic, and the mover where the new player, the mover, attempts to fool the critic by shifting the data around. Despite involving three players, we demonstrate that our proposed formulation can be solved reasonably effectively via a simple alternative gradient learning strategy. Compared with the existing Lipschitz-constrained formulations of Wasserstein GAN on CIFAR-10, our model yields significantly better diversity scores than weight clipping and comparable performance to gradient penalty method.
    @INPROCEEDINGS { dam_etal_ijcai19_3pwgan,
        AUTHOR = { Nhan Dam and Quan Hoang and Trung Le and Tu Dinh Nguyen and Hung Bui and Dinh Phung },
        BOOKTITLE = { Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, (IJCAI) },
        TITLE = { Three-Player {W}asserstein {GAN} via Amortised Duality },
        YEAR = { 2019 },
        MONTH = { July },
        PAGES = { 2202--2208 },
        PUBLISHER = { International Joint Conferences on Artificial Intelligence Organization },
        ABSTRACT = { We propose a new formulation for learning generative adversarial networks (GANs) using optimal transport cost (the general form of Wasserstein distance) as the objective criterion to measure the dissimilarity between target distribution and learned distribution. Our formulation is based on the general form of the Kantorovich duality which is applicable to optimal transport with a wide range of cost functions that are not necessarily a metric. To make optimising this duality form amenable to gradient-based methods, we employ a function that acts as an amortised optimiser for the innermost optimisation problem. Interestingly, the amortised optimiser can be viewed as a mover since it strategically shifts around data points. The resulting formulation is a sequential min-max-min game with 3 players: the generator, the critic, and the mover where the new player, the mover, attempts to fool the critic by shifting the data around. Despite involving three players, we demonstrate that our proposed formulation can be solved reasonably effectively via a simple alternative gradient learning strategy. Compared with the existing Lipschitz-constrained formulations of Wasserstein GAN on CIFAR-10, our model yields significantly better diversity scores than weight clipping and comparable performance to gradient penalty method. },
        FILE = { :dam_etal_ijcai19_3pwgan - Three Player Wasserstein GAN Via Amortised Duality.pdf:PDF },
        URL = { https://www.ijcai.org/Proceedings/2019/305 },
    }
C
  • Learning How to Active Learn by Dreaming
    Thuy-Trang Vu, Ming Liu, Dinh Phung and Gholamreza Haffari. In In Proc. of Annual Meeting of the Association for Computational Linguistics (ACL), Florence, Italy, jul 2019. [ | ]
    @INPROCEEDINGS { vu_etal_acl19_learning,
        AUTHOR = { Thuy-Trang Vu and Ming Liu and Dinh Phung and Gholamreza Haffari },
        TITLE = { Learning How to Active Learn by Dreaming },
        BOOKTITLE = { In Proc. of Annual Meeting of the Association for Computational Linguistics (ACL) },
        YEAR = { 2019 },
        ADDRESS = { Florence, Italy },
        MONTH = { jul },
    }
C
  • Deep Domain Adaptation for Vulnerable Code Function Identification
    Van Nguyen, Trung Le, Tue Le, Khanh Nguyen, Olivier DeVel, Paul Montague, Lizhen Qu and Dinh Phung. In Int. Joint Conf. on Neural Networks (IJCNN), 2019. [ | ]
    Due to the ubiquity of computer software, software vulnerability detection (SVD) has become crucial in the software industry and in the field of computer security. Two significant issues in SVD arise when using machine learning, namely: i) how to learn automatic features that can help improve the predictive performance of vulnerability detection and ii) how to overcome the scarcity of labeled vulnerabilities in projects that require the laborious labeling of code by software security experts. In this paper, we address these two crucial concerns by proposing a novel architecture which leverages deep domain adaptation with automatic feature learning for software vulnerability identification. Based on this architecture, we keep the principles and reapply the state-of-the-art deep domain adaptation methods to indicate that deep domain adaptation for SVD is plausible and promising. Moreover, we further propose a novel method named Semi-supervised Code Domain Adaptation Network (SCDAN) that can efficiently utilize and exploit information carried in unlabeled target data by considering them as the unlabeled portion in a semi-supervised learning context. The proposed SCDAN method enforces the clustering assumption, which is a key principle in semi-supervised learning. The experimental results using six real-world software project datasets show that our SCDAN method and the baselines using our architecture have better predictive performance by a wide margin compared with the Deep Code Network (VulDeePecker) method without domain adaptation. Also, the proposed SCDAN significantly outperforms the DIRT-T which to the best of our knowledge is currently the-state-of-the-art method in deep domain adaptation and other baselines.
    @INPROCEEDINGS { van_etal_ijcnn19_deepdomain,
        AUTHOR = { Van Nguyen and Trung Le and Tue Le and Khanh Nguyen and Olivier DeVel and Paul Montague and Lizhen Qu and Dinh Phung },
        TITLE = { Deep Domain Adaptation for Vulnerable Code Function Identification },
        BOOKTITLE = { Int. Joint Conf. on Neural Networks (IJCNN) },
        YEAR = { 2019 },
        ABSTRACT = { Due to the ubiquity of computer software, software vulnerability detection (SVD) has become crucial in the software industry and in the field of computer security. Two significant issues in SVD arise when using machine learning, namely: i) how to learn automatic features that can help improve the predictive performance of vulnerability detection and ii) how to overcome the scarcity of labeled vulnerabilities in projects that require the laborious labeling of code by software security experts. In this paper, we address these two crucial concerns by proposing a novel architecture which leverages deep domain adaptation with automatic feature learning for software vulnerability identification. Based on this architecture, we keep the principles and reapply the state-of-the-art deep domain adaptation methods to indicate that deep domain adaptation for SVD is plausible and promising. Moreover, we further propose a novel method named Semi-supervised Code Domain Adaptation Network (SCDAN) that can efficiently utilize and exploit information carried in unlabeled target data by considering them as the unlabeled portion in a semi-supervised learning context. The proposed SCDAN method enforces the clustering assumption, which is a key principle in semi-supervised learning. The experimental results using six real-world software project datasets show that our SCDAN method and the baselines using our architecture have better predictive performance by a wide margin compared with the Deep Code Network (VulDeePecker) method without domain adaptation. Also, the proposed SCDAN significantly outperforms the DIRT-T which to the best of our knowledge is currently the-state-of-the-art method in deep domain adaptation and other baselines. },
        FILE = { :van_etal_ijcnn19_deepdomain - Deep Domain Adaptation for Vulnerable Code Function Identification.pdf:PDF },
    }
C
  • A Capsule Network-based Embedding Model for Knowledge Graph Completion and Search Personalization
    Dai Quoc Nguyen, Thanh Vu, Tu Dinh Nguyen, Dat Quoc Nguyen and Dinh Phung. In In Proc. of Annual Conf. of the North American Chapter of the Association for Computational Linguistics (NAACL), Minneapolis, USA, jun 2019. [ | | pdf]
    In this paper, we introduce an embedding model, named CapsE, exploring a capsule network to model relationship triples (subject, relation, object). Our CapsE represents each triple as a 3-column matrix where each column vector represents the embedding of an element in the triple. This 3-column matrix is then fed to a convolution layer where multiple filters are operated to generate different feature maps. These feature maps are used to construct capsules in the first capsule layer. Capsule layers are connected via dynamic routing mechanism. The last capsule layer consists of only one capsule to produce a vector output. The length of this vector output is used to measure the plausibility of the triple. Our proposed CapsE obtains state-of-the-art link prediction results for knowledge graph completion on two benchmark datasets: WN18RR and FB15k-237, and outperforms strong search personalization baselines on SEARCH17 dataset.
    @INPROCEEDINGS { nguyen_etal_naaclhtl19_acapsule,
        AUTHOR = { Dai Quoc Nguyen and Thanh Vu and Tu Dinh Nguyen and Dat Quoc Nguyen and Dinh Phung },
        TITLE = { A Capsule Network-based Embedding Model for Knowledge Graph Completion and Search Personalization },
        BOOKTITLE = { In Proc. of Annual Conf. of the North American Chapter of the Association for Computational Linguistics (NAACL) },
        YEAR = { 2019 },
        ADDRESS = { Minneapolis, USA },
        MONTH = { jun },
        ABSTRACT = { In this paper, we introduce an embedding model, named CapsE, exploring a capsule network to model relationship triples (subject, relation, object). Our CapsE represents each triple as a 3-column matrix where each column vector represents the embedding of an element in the triple. This 3-column matrix is then fed to a convolution layer where multiple filters are operated to generate different feature maps. These feature maps are used to construct capsules in the first capsule layer. Capsule layers are connected via dynamic routing mechanism. The last capsule layer consists of only one capsule to produce a vector output. The length of this vector output is used to measure the plausibility of the triple. Our proposed CapsE obtains state-of-the-art link prediction results for knowledge graph completion on two benchmark datasets: WN18RR and FB15k-237, and outperforms strong search personalization baselines on SEARCH17 dataset. },
        FILE = { :nguyen_etal_naaclhtl19_acapsule - A Capsule Network Based Embedding Model for Knowledge Graph Completion and Search Personalization.pdf:PDF },
        URL = { https://arxiv.org/abs/1808.04122 },
    }
C
  • Probabilistic Multilevel Clustering via Composite Transportation Distance
    Viet Huynh, Nhat Ho, Dinh Phung and Michael I. Jordan. In In Proc. of Int. Conf. on Artificial Intelligence and Statistics (AISTATS), Okinawa, Japan, apr 2019. [ | | pdf]
    We propose a novel probabilistic approach to multilevel clustering problems based on composite transportation distance, which is a variant of transportation distance where the underlying metric is Kullback-Leibler divergence. Our method involves solving a joint optimization problem over spaces of probability measures to simultaneously discover grouping structures within groups and among groups. By exploiting the connection of our method to the problem of finding composite transportation barycenters, we develop fast and efficient optimization algorithms even for potentially large-scale multilevel datasets. Finally, we present experimental results with both synthetic and real data to demonstrate the efficiency and scalability of the proposed approach.
    @INPROCEEDINGS { ho_etal_aistats19_probabilistic,
        AUTHOR = { Viet Huynh and Nhat Ho and Dinh Phung and Michael I. Jordan },
        TITLE = { Probabilistic Multilevel Clustering via Composite Transportation Distance },
        BOOKTITLE = { In Proc. of Int. Conf. on Artificial Intelligence and Statistics (AISTATS) },
        YEAR = { 2019 },
        ADDRESS = { Okinawa, Japan },
        MONTH = { apr },
        ABSTRACT = { We propose a novel probabilistic approach to multilevel clustering problems based on composite transportation distance, which is a variant of transportation distance where the underlying metric is Kullback-Leibler divergence. Our method involves solving a joint optimization problem over spaces of probability measures to simultaneously discover grouping structures within groups and among groups. By exploiting the connection of our method to the problem of finding composite transportation barycenters, we develop fast and efficient optimization algorithms even for potentially large-scale multilevel datasets. Finally, we present experimental results with both synthetic and real data to demonstrate the efficiency and scalability of the proposed approach. },
        FILE = { :ho_etal_aistats19_probabilistic - Probabilistic Multilevel Clustering Via Composite Transportation Distance.pdf:PDF },
        JOURNAL = { In Proc. of Int. Conf. on Artificial Intelligence and Statistics (AISTATS) },
        URL = { https://arxiv.org/abs/1810.11911 },
    }
C
  • Maximal Divergence Sequential Autoencoder for Binary Software Vulnerability Detection
    Tue Le, Tuan Nguyen, Trung Le, Dinh Phung, Paul Montague, Olivier De Vel and Lizhen Qu. In International Conference on Learning Representations (ICLR), 2019. [ | | pdf]
    @INPROCEEDINGS { le_etal_iclr18_maximal,
        AUTHOR = { Tue Le and Tuan Nguyen and Trung Le and Dinh Phung and Paul Montague and Olivier De Vel and Lizhen Qu },
        TITLE = { Maximal Divergence Sequential Autoencoder for Binary Software Vulnerability Detection },
        BOOKTITLE = { International Conference on Learning Representations (ICLR) },
        YEAR = { 2019 },
        FILE = { :le_etal_iclr18_maximal - Maximal Divergence Sequential Autoencoder for Binary Software Vulnerability Detection.pdf:PDF },
        URL = { https://openreview.net/forum?id=ByloIiCqYQ },
    }
C
  • Robust Anomaly Detection in Videos using Multilevel Representations
    Hung Vu, Tu Dinh Nguyen, Trung Le, Wei Luo and Dinh Phung. In In Proceedings of Thirty-third AAAI Conference on Artificial Intelligence (AAAI), Honolulu, USA, 2019. [ | | pdf]
    @INPROCEEDINGS { vu_etal_aaai19_robustanomaly,
        AUTHOR = { Hung Vu and Tu Dinh Nguyen and Trung Le and Wei Luo and Dinh Phung },
        TITLE = { Robust Anomaly Detection in Videos using Multilevel Representations },
        BOOKTITLE = { In Proceedings of Thirty-third AAAI Conference on Artificial Intelligence (AAAI) },
        YEAR = { 2019 },
        ADDRESS = { Honolulu, USA },
        FILE = { :vu_etal_aaai19_robustanomaly - Robust Anomaly Detection in Videos Using Multilevel Representations.pdf:PDF },
        GROUPS = { Anomaly Detection },
        URL = { https://github.com/SeaOtter/vad_gan },
    }
C
  • Usefulness of Wearable Cameras as a Tool to Enhance Chronic Disease Self-Management: Scoping Review
    Ralph Maddison, Susie Cartledge, Michelle Rogerson, Nicole Sylvia Goedhart, Tarveen Ragbir Singh, Christopher Neil, Dinh Phung and Kylie Ball. JMIR mHealth and uHealth, 7(1):e10371, Jan 2019. [ | | pdf]
    Background: Self-management is a critical component of chronic disease management and can include a host of activities, such as adhering to prescribed medications, undertaking daily care activities, managing dietary intake and body weight, and proactively contacting medical practitioners. The rise of technologies (mobile phones, wearable cameras) for health care use offers potential support for people to better manage their disease in collaboration with their treating health professionals. Wearable cameras can be used to provide rich contextual data and insight into everyday activities and aid in recall. This information can then be used to prompt memory recall or guide the development of interventions to support self-management. Application of wearable cameras to better understand and augment self-management by people with chronic disease has yet to be investigated. Objective: The objective of our review was to ascertain the scope of the literature on the use of wearable cameras for self-management by people with chronic disease and to determine the potential of wearable cameras to assist people to better manage their disease. Methods: We conducted a scoping review, which involved a comprehensive electronic literature search of 9 databases in July 2017. The search strategy focused on studies that used wearable cameras to capture one or more modifiable lifestyle risk factors associated with chronic disease or to capture typical self-management behaviors, or studies that involved a chronic disease population. We then categorized and described included studies according to their characteristics (eg, behaviors measured, study design or type, characteristics of the sample). Results: We identified 31 studies: 25 studies involved primary or secondary data analysis, and 6 were review, discussion, or descriptive articles. Wearable cameras were predominantly used to capture dietary intake, physical activity, activities of daily living, and sedentary behavior. Populations studied were predominantly healthy volunteers, school students, and sports people, with only 1 study examining an intervention using wearable cameras for people with an acquired brain injury. Most studies highlighted technical or ethical issues associated with using wearable cameras, many of which were overcome. Conclusions: This scoping review highlighted the potential of wearable cameras to capture health-related behaviors and risk factors of chronic disease, such as diet, exercise, and sedentary behaviors. Data collected from wearable cameras can be used as an adjunct to traditional data collection methods such as self-reported diaries in addition to providing valuable contextual information. While most studies to date have focused on healthy populations, wearable cameras offer promise to better understand self-management of chronic disease and its context.
    @ARTICLE { maddison_etal_jmir19_usefulness,
        AUTHOR = { Ralph Maddison and Susie Cartledge and Michelle Rogerson and Nicole Sylvia Goedhart and Tarveen Ragbir Singh and Christopher Neil and Dinh Phung and Kylie Ball },
        JOURNAL = { JMIR mHealth and uHealth },
        TITLE = { Usefulness of Wearable Cameras as a Tool to Enhance Chronic Disease Self-Management: Scoping Review },
        YEAR = { 2019 },
        ISSN = { 2291-5222 },
        MONTH = { Jan },
        NUMBER = { 1 },
        PAGES = { e10371 },
        VOLUME = { 7 },
        ABSTRACT = { Background: Self-management is a critical component of chronic disease management and can include a host of activities, such as adhering to prescribed medications, undertaking daily care activities, managing dietary intake and body weight, and proactively contacting medical practitioners. The rise of technologies (mobile phones, wearable cameras) for health care use offers potential support for people to better manage their disease in collaboration with their treating health professionals. Wearable cameras can be used to provide rich contextual data and insight into everyday activities and aid in recall. This information can then be used to prompt memory recall or guide the development of interventions to support self-management. Application of wearable cameras to better understand and augment self-management by people with chronic disease has yet to be investigated. Objective: The objective of our review was to ascertain the scope of the literature on the use of wearable cameras for self-management by people with chronic disease and to determine the potential of wearable cameras to assist people to better manage their disease. Methods: We conducted a scoping review, which involved a comprehensive electronic literature search of 9 databases in July 2017. The search strategy focused on studies that used wearable cameras to capture one or more modifiable lifestyle risk factors associated with chronic disease or to capture typical self-management behaviors, or studies that involved a chronic disease population. We then categorized and described included studies according to their characteristics (eg, behaviors measured, study design or type, characteristics of the sample). Results: We identified 31 studies: 25 studies involved primary or secondary data analysis, and 6 were review, discussion, or descriptive articles. Wearable cameras were predominantly used to capture dietary intake, physical activity, activities of daily living, and sedentary behavior. Populations studied were predominantly healthy volunteers, school students, and sports people, with only 1 study examining an intervention using wearable cameras for people with an acquired brain injury. Most studies highlighted technical or ethical issues associated with using wearable cameras, many of which were overcome. Conclusions: This scoping review highlighted the potential of wearable cameras to capture health-related behaviors and risk factors of chronic disease, such as diet, exercise, and sedentary behaviors. Data collected from wearable cameras can be used as an adjunct to traditional data collection methods such as self-reported diaries in addition to providing valuable contextual information. While most studies to date have focused on healthy populations, wearable cameras offer promise to better understand self-management of chronic disease and its context. },
        DAY = { 03 },
        DOI = { 10.2196/10371 },
        FILE = { :ralph_etal_jmir19_usefulness - Usefulness of Wearable Cameras As a Tool to Enhance Chronic Disease Self Management_ Scoping Review.pdf:PDF },
        KEYWORDS = { eHealth; review; cameras; life-logging; lifestyle behavior; chronic disease },
        URL = { https://mhealth.jmir.org/2019/1/e10371/ },
    }
J
  • On Deep Domain Adaptation: Some Theoretical Understandings
    Trung Le, Khanh Nguyen, Nhat Ho, Hung Bui and Dinh Phung, jun 2019. [ | | pdf]
    Compared with shallow domain adaptation, recent progress in deep domain adaptation has shown that it can achieve higher predictive performance and stronger capacity to tackle structural data (e.g., image and sequential data). The underlying idea of deep domain adaptation is to bridge the gap between source and target domains in a joint space so that a supervised classifier trained on labeled source data can be nicely transferred to the target domain. This idea is certainly intuitive and powerful, however, limited theoretical understandings have been developed to support its underpinning principle. In this paper, we have provided a rigorous framework to explain why it is possible to close the gap of the target and source domains in the joint space. More specifically, we first study the loss incurred when performing transfer learning from the source to the target domain. This provides a theory that explains and generalizes existing work in deep domain adaptation which was mainly empirical. This enables us to further explain why closing the gap in the joint space can directly minimize the loss incurred for transfer learning between the two domains. To our knowledge, this offers the first theoretical result that characterizes a direct bound on the joint space and the gain of transfer learning via deep domain adaptation.
    @MISC { le_etal_arxiv19_ondeepdomain,
        AUTHOR = { Trung Le and Khanh Nguyen and Nhat Ho and Hung Bui and Dinh Phung },
        TITLE = { On Deep Domain Adaptation: Some Theoretical Understandings },
        MONTH = { jun },
        YEAR = { 2019 },
        ABSTRACT = { Compared with shallow domain adaptation, recent progress in deep domain adaptation has shown that it can achieve higher predictive performance and stronger capacity to tackle structural data (e.g., image and sequential data). The underlying idea of deep domain adaptation is to bridge the gap between source and target domains in a joint space so that a supervised classifier trained on labeled source data can be nicely transferred to the target domain. This idea is certainly intuitive and powerful, however, limited theoretical understandings have been developed to support its underpinning principle. In this paper, we have provided a rigorous framework to explain why it is possible to close the gap of the target and source domains in the joint space. More specifically, we first study the loss incurred when performing transfer learning from the source to the target domain. This provides a theory that explains and generalizes existing work in deep domain adaptation which was mainly empirical. This enables us to further explain why closing the gap in the joint space can directly minimize the loss incurred for transfer learning between the two domains. To our knowledge, this offers the first theoretical result that characterizes a direct bound on the joint space and the gain of transfer learning via deep domain adaptation. },
        ARCHIVEPREFIX = { arXiv },
        JOURNAL = { arXiv },
        URL = { http://arxiv.org/abs/1811.06199 },
    }
  • On Scalable Variant of Wasserstein Barycenter
    Tam Le, Viet Huynh, Nhat Ho, Dinh Phung and Makoto Yamada, 2019. [ | ]
    We study a variant of Wasserstein barycenter problem, which we refer to as \emph{tree-sliced Wasserstein barycenter}, by leveraging the structure of tree metrics for the ground metrics in the formulation of Wasserstein distance. Drawing on the tree structure, we propose efficient algorithms for solving the unconstrained and constrained versions of tree-sliced Wasserstein barycenter. The algorithms have fast computational time and efficient memory usage, especially for high dimensional settings while demonstrating favorable results when the tree metrics are appropriately constructed. Experimental results on large-scale synthetic and real datasets from Wasserstein barycenter for documents with word embedding, multilevel clustering, and scalable Bayes problems show the advantages of tree-sliced Wasserstein barycenter over (Sinkhorn) Wasserstein barycenter.
    @MISC { le_etal_arxiv19_scalable,
        AUTHOR = { Tam Le and Viet Huynh and Nhat Ho and Dinh Phung and Makoto Yamada },
        TITLE = { On Scalable Variant of Wasserstein Barycenter },
        YEAR = { 2019 },
        ABSTRACT = { We study a variant of Wasserstein barycenter problem, which we refer to as \emph{tree-sliced Wasserstein barycenter}, by leveraging the structure of tree metrics for the ground metrics in the formulation of Wasserstein distance. Drawing on the tree structure, we propose efficient algorithms for solving the unconstrained and constrained versions of tree-sliced Wasserstein barycenter. The algorithms have fast computational time and efficient memory usage, especially for high dimensional settings while demonstrating favorable results when the tree metrics are appropriately constructed. Experimental results on large-scale synthetic and real datasets from Wasserstein barycenter for documents with word embedding, multilevel clustering, and scalable Bayes problems show the advantages of tree-sliced Wasserstein barycenter over (Sinkhorn) Wasserstein barycenter. },
        ARCHIVEPREFIX = { arXiv },
        EPRINT = { 1910.04483 },
        PRIMARYCLASS = { stat.ML },
    }
  • Perturbations are not Enough: Generating Adversarial Examples with Spatial Distortions
    He Zhao, Trung Le, Paul Montague, Olivier De Vel, Tamas Abraham and Dinh Phung, 2019. [ | ]
    @MISC { zhao_etal_arxiv19_perturbations,
        AUTHOR = { He Zhao and Trung Le and Paul Montague and Olivier De Vel and Tamas Abraham and Dinh Phung },
        TITLE = { Perturbations are not Enough: Generating Adversarial Examples with Spatial Distortions },
        YEAR = { 2019 },
        ARCHIVEPREFIX = { arXiv },
        EPRINT = { 1910.01329 },
        PRIMARYCLASS = { cs.LG },
    }
  • Unsupervised Universal Self-Attention Network for Graph Classification
    Dai Quoc Nguyen, Tu Dinh Nguyen and Dinh Phung, 2019. [ | ]
    @MISC { nguyen_etal_arxiv19_unsupervised,
        AUTHOR = { Dai Quoc Nguyen and Tu Dinh Nguyen and Dinh Phung },
        TITLE = { Unsupervised Universal Self-Attention Network for Graph Classification },
        YEAR = { 2019 },
        ARCHIVEPREFIX = { arXiv },
        EPRINT = { 1909.11855 },
        PRIMARYCLASS = { cs.LG },
    }
  • On Efficient Multilevel Clustering via Wasserstein Distances
    Viet Huynh, Nhat Ho, Nhan Dam, XuanLong Nguyen, Mikhail Yurochkin, Hung Bui and and Dinh Phung, 2019. [ | ]
    @MISC { huynh_etal_arxiv19_efficient,
        AUTHOR = { Viet Huynh and Nhat Ho and Nhan Dam and XuanLong Nguyen and Mikhail Yurochkin and Hung Bui and and Dinh Phung },
        TITLE = { On Efficient Multilevel Clustering via Wasserstein Distances },
        YEAR = { 2019 },
        ARCHIVEPREFIX = { arXiv },
        EPRINT = { 1909.08787 },
        PRIMARYCLASS = { stat.ML },
    }
2018
  • Model-Based Learning for Point Pattern Data
    Ba-Ngu Vo, Nhan Dam, Dinh Phung, Quang N. Tran and Ba-Tuong Vo. Pattern Recognition (PR), 84:136-151, 2018. [ | | pdf]
    This article proposes a framework for model-based point pattern learning using point process theory. Likelihood functions for point pattern data derived from point process theory enable principled yet conceptually transparent extensions of learning tasks, such as classification, novelty detection and clustering, to point pattern data. Furthermore, tractable point pattern models as well as solutions for learning and decision making from point pattern data are developed.
    @ARTICLE { vo_etal_pr18_modelbased,
        AUTHOR = { Ba-Ngu Vo and Nhan Dam and Dinh Phung and Quang N. Tran and Ba-Tuong Vo },
        JOURNAL = { Pattern Recognition (PR) },
        TITLE = { Model-Based Learning for Point Pattern Data },
        YEAR = { 2018 },
        ISSN = { 0031-3203 },
        PAGES = { 136--151 },
        VOLUME = { 84 },
        ABSTRACT = { This article proposes a framework for model-based point pattern learning using point process theory. Likelihood functions for point pattern data derived from point process theory enable principled yet conceptually transparent extensions of learning tasks, such as classification, novelty detection and clustering, to point pattern data. Furthermore, tractable point pattern models as well as solutions for learning and decision making from point pattern data are developed. },
        DOI = { https://doi.org/10.1016/j.patcog.2018.07.008 },
        FILE = { :vo_etal_pr18_modelbased - Model Based Learning for Point Pattern Data.pdf:PDF },
        KEYWORDS = { Point pattern, Point process, Random finite set, Multiple instance learning, Classification, Novelty detection, Clustering },
        PUBLISHER = { Elsevier },
        URL = { http://www.sciencedirect.com/science/article/pii/S0031320318302395 },
    }
J
  • Robust Bayesian Kernel Machine via Stein Variational Gradient Descent for Big Data
    Khanh Nguyen, Trung Le, Tu Nguyen, Geoff Webb and Dinh Phung. In Proc. of the 24th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD), London, UK, aug 2018. [ | ]
    Kernel methods are powerful supervised machine learning models for their strong generalization ability, especially on limited data to effectively generalize on unseen data. However, most kernel methods, including the state-of-the-art LIBSVM, are vulnerable to the curse of kernelization, making them infeasible to apply to large-scale datasets. This issue is exacerbated when kernel methods are used in conjunction with a grid search to tune their kernel parameters and hyperparameters which brings in the question of model robustness when applied to real datasets. In this paper, we propose a robust Bayesian Kernel Machine (BKM) – a Bayesian kernel machine that exploits the strengths of both the Bayesian modelling and kernel methods. A key challenge for such a formulation is the need for an efcient learning algorithm. To this end, we successfully extended the recent Stein variational theory for Bayesian inference for our proposed model, resulting in fast and efcient learning and prediction algorithms. Importantly our proposed BKM is resilient to the curse of kernelization, hence making it applicable to large-scale datasets and robust to parameter tuning, avoiding the associated expense and potential pitfalls with current practice of parameter tuning. Our extensive experimental results on 12 benchmark datasets show that our BKM without tuning any parameter can achieve comparable predictive performance with the state-of-the-art LIBSVM and signifcantly outperforms other baselines, while obtaining signifcantly speedup in terms of the total training time compared with its rivals.
    @INPROCEEDINGS { nguyen_etal_kdd18_robustbayesian,
        AUTHOR = { Khanh Nguyen and Trung Le and Tu Nguyen and Geoff Webb and Dinh Phung },
        TITLE = { Robust Bayesian Kernel Machine via Stein Variational Gradient Descent for Big Data },
        BOOKTITLE = { Proc. of the 24th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD) },
        YEAR = { 2018 },
        ADDRESS = { London, UK },
        MONTH = { aug },
        PUBLISHER = { ACM },
        ABSTRACT = { Kernel methods are powerful supervised machine learning models for their strong generalization ability, especially on limited data to effectively generalize on unseen data. However, most kernel methods, including the state-of-the-art LIBSVM, are vulnerable to the curse of kernelization, making them infeasible to apply to large-scale datasets. This issue is exacerbated when kernel methods are used in conjunction with a grid search to tune their kernel parameters and hyperparameters which brings in the question of model robustness when applied to real datasets. In this paper, we propose a robust Bayesian Kernel Machine (BKM) – a Bayesian kernel machine that exploits the strengths of both the Bayesian modelling and kernel methods. A key challenge for such a formulation is the need for an efcient learning algorithm. To this end, we successfully extended the recent Stein variational theory for Bayesian inference for our proposed model, resulting in fast and efcient learning and prediction algorithms. Importantly our proposed BKM is resilient to the curse of kernelization, hence making it applicable to large-scale datasets and robust to parameter tuning, avoiding the associated expense and potential pitfalls with current practice of parameter tuning. Our extensive experimental results on 12 benchmark datasets show that our BKM without tuning any parameter can achieve comparable predictive performance with the state-of-the-art LIBSVM and signifcantly outperforms other baselines, while obtaining signifcantly speedup in terms of the total training time compared with its rivals. },
        FILE = { :nguyen_etal_kdd18_robustbayesian - Robust Bayesian Kernel Machine Via Stein Variational Gradient Descent for Big Data.pdf:PDF },
    }
C
  • MGAN: Training Generative Adversarial Nets with Multiple Generators
    Quan Hoang, Tu Dinh Nguyen, Trung Le and Dinh Phung. In International Conference on Learning Representations (ICLR), 2018. [ | | pdf]
    We propose in this paper a new approach to train the Generative Adversarial Nets (GANs) with a mixture of generators to overcome the mode collapsing problem. The main intuition is to employ multiple generators, instead of using a single one as in the original GAN. The idea is simple, yet proven to be extremely effective at covering diverse data modes, easily overcoming the mode collapsing problem and delivering state-of-the-art results. A minimax formulation was able to establish among a classifier, a discriminator, and a set of generators in a similar spirit with GAN. Generators create samples that are intended to come from the same distribution as the training data, whilst the discriminator determines whether samples are true data or generated by generators, and the classifier specifies which generator a sample comes from. The distinguishing feature is that internal samples are created from multiple generators, and then one of them will be randomly selected as final output similar to the mechanism of a probabilistic mixture model. We term our method Mixture Generative Adversarial Nets (MGAN). We develop theoretical analysis to prove that, at the equilibrium, the Jensen-Shannon divergence (JSD) between the mixture of generators’ distributions and the empirical data distribution is minimal, whilst the JSD among generators’ distributions is maximal, hence effectively avoiding the mode collapsing problem. By utilizing parameter sharing, our proposed model adds minimal computational cost to the standard GAN, and thus can also efficiently scale to large-scale datasets. We conduct extensive experiments on synthetic 2D data and natural image databases (CIFAR-10, STL-10 and ImageNet) to demonstrate the superior performance of our MGAN in achieving state-of-the-art Inception scores over latest baselines, generating diverse and appealing recognizable objects at different resolutions, and specializing in capturing different types of objects by the generators.
    @INPROCEEDINGS { hoang_etal_iclr18_mgan,
        AUTHOR = { Quan Hoang and Tu Dinh Nguyen and Trung Le and Dinh Phung },
        TITLE = { {MGAN}: Training Generative Adversarial Nets with Multiple Generators },
        BOOKTITLE = { International Conference on Learning Representations (ICLR) },
        YEAR = { 2018 },
        ABSTRACT = { We propose in this paper a new approach to train the Generative Adversarial Nets (GANs) with a mixture of generators to overcome the mode collapsing problem. The main intuition is to employ multiple generators, instead of using a single one as in the original GAN. The idea is simple, yet proven to be extremely effective at covering diverse data modes, easily overcoming the mode collapsing problem and delivering state-of-the-art results. A minimax formulation was able to establish among a classifier, a discriminator, and a set of generators in a similar spirit with GAN. Generators create samples that are intended to come from the same distribution as the training data, whilst the discriminator determines whether samples are true data or generated by generators, and the classifier specifies which generator a sample comes from. The distinguishing feature is that internal samples are created from multiple generators, and then one of them will be randomly selected as final output similar to the mechanism of a probabilistic mixture model. We term our method Mixture Generative Adversarial Nets (MGAN). We develop theoretical analysis to prove that, at the equilibrium, the Jensen-Shannon divergence (JSD) between the mixture of generators’ distributions and the empirical data distribution is minimal, whilst the JSD among generators’ distributions is maximal, hence effectively avoiding the mode collapsing problem. By utilizing parameter sharing, our proposed model adds minimal computational cost to the standard GAN, and thus can also efficiently scale to large-scale datasets. We conduct extensive experiments on synthetic 2D data and natural image databases (CIFAR-10, STL-10 and ImageNet) to demonstrate the superior performance of our MGAN in achieving state-of-the-art Inception scores over latest baselines, generating diverse and appealing recognizable objects at different resolutions, and specializing in capturing different types of objects by the generators. },
        FILE = { :hoang_etal_iclr18_mgan - MGAN_ Training Generative Adversarial Nets with Multiple Generators.pdf:PDF },
        URL = { https://openreview.net/forum?id=rkmu5b0a- },
    }
C
  • Geometric enclosing networks
    Trung Le, Hung Vu, Tu Dinh Nguyen and Dinh Phung. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, {IJCAI-18}, pages 2355-2361, July 2018. [ | ]
    Training model to generate data has increasingly attracted research attention and become important in modern world applications. We propose in this paper a new geometry-based optimization approach to address this problem. Orthogonal to current stateof-the-art density-based approaches, most notably VAE and GAN, we present a fresh new idea that borrows the principle of minimal enclosing ball to train a generator G (z) in such a way that both training and generated data, after being mapped to the feature space, are enclosed in the same sphere. We develop theory to guarantee that the mapping is bijective so that its inverse from feature space to data space results in expressive nonlinear contours to describe the data manifold, hence ensuring data generated are also lying on the data manifold learned from training data. Our model enjoys a nice geometric interpretation, hence termed Geometric Enclosing Networks (GEN), and possesses some key advantages over its rivals, namely simple and easy-to-control optimization formulation, avoidance of mode collapsing and efficiently learn data manifold representation in a completely unsupervised manner. We conducted extensive experiments on synthesis and real-world datasets to illustrate the behaviors, strength and weakness of our proposed GEN, in particular its ability to handle multi-modal data and quality of generated data.
    @INPROCEEDINGS { le_etal_ijcai18_geometric,
        AUTHOR = { Trung Le and Hung Vu and Tu Dinh Nguyen and Dinh Phung },
        TITLE = { Geometric enclosing networks },
        BOOKTITLE = { Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, {IJCAI-18} },
        PUBLISHER = { International Joint Conferences on Artificial Intelligence Organization },
        PAGES = { 2355--2361 },
        YEAR = { 2018 },
        MONTH = { July },
        ABSTRACT = { Training model to generate data has increasingly attracted research attention and become important in modern world applications. We propose in this paper a new geometry-based optimization approach to address this problem. Orthogonal to current stateof-the-art density-based approaches, most notably VAE and GAN, we present a fresh new idea that borrows the principle of minimal enclosing ball to train a generator G (z) in such a way that both training and generated data, after being mapped to the feature space, are enclosed in the same sphere. We develop theory to guarantee that the mapping is bijective so that its inverse from feature space to data space results in expressive nonlinear contours to describe the data manifold, hence ensuring data generated are also lying on the data manifold learned from training data. Our model enjoys a nice geometric interpretation, hence termed Geometric Enclosing Networks (GEN), and possesses some key advantages over its rivals, namely simple and easy-to-control optimization formulation, avoidance of mode collapsing and efficiently learn data manifold representation in a completely unsupervised manner. We conducted extensive experiments on synthesis and real-world datasets to illustrate the behaviors, strength and weakness of our proposed GEN, in particular its ability to handle multi-modal data and quality of generated data. },
        FILE = { :le_etal_ijcai18_geometric - Geometric Enclosing Networks.pdf:PDF },
    }
C
  • A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network
    Dai Quoc Nguyen, Tu Dinh Nguyen, Dat Quoc Nguyen and Dinh Phung. In Proc. of. the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL), 2018. [ | | pdf]
    We introduce a novel embedding method for knowledge base completion task. Our approach advances state-of-the-art (SOTA) by employing a convolutional neural network (CNN) for the task which can capture global relationships and transitional characteristics. We represent each triple (head entity, relation, tail entity) as a 3-column matrix which is the input for the convolution layer. Different filters having a same shape of 1x3 are operated over the input matrix to produce different feature maps which are then concatenated into a single feature vector. This vector is used to return a score for the triple via a dot product. The returned score is used to predict whether the triple is valid or not. Experiments show that ConvKB achieves better link prediction results than previous SOTA models on two current benchmark datasets WN18RR and FB15k-237.
    @INPROCEEDINGS { nguyen_etal_naacl18_anovelembedding,
        AUTHOR = { Dai Quoc Nguyen and Tu Dinh Nguyen and Dat Quoc Nguyen and Dinh Phung },
        TITLE = { A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network },
        BOOKTITLE = { Proc. of. the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL) },
        YEAR = { 2018 },
        ABSTRACT = { We introduce a novel embedding method for knowledge base completion task. Our approach advances state-of-the-art (SOTA) by employing a convolutional neural network (CNN) for the task which can capture global relationships and transitional characteristics. We represent each triple (head entity, relation, tail entity) as a 3-column matrix which is the input for the convolution layer. Different filters having a same shape of 1x3 are operated over the input matrix to produce different feature maps which are then concatenated into a single feature vector. This vector is used to return a score for the triple via a dot product. The returned score is used to predict whether the triple is valid or not. Experiments show that ConvKB achieves better link prediction results than previous SOTA models on two current benchmark datasets WN18RR and FB15k-237. },
        FILE = { :nguyen_etal_naacl18_anovelembedding - A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network.pdf:PDF },
        URL = { https://arxiv.org/abs/1712.02121 },
    }
C
  • Text Generation with Deep Variational GAN
    Mahmoud Hossam, Trung Le, Michael Papasimeon, Viet Huynh and Dinh Phung. In 32nd Neural Information Processing System (NIPS) Workshop on Bayesian Deep Learning, 2018. [ | ]
    Generating realistic sequences is a central task in many machine learning appli-cations. There has been considerable recent progress on building deep generativemodels for sequence generation tasks. However, the issue of mode-collapsingremains a main issue for the current models. In this paper we propose a GAN-based generic framework to address the problem of mode-collapse in a principledapproach. We change the standard GAN objective to maximize a variationallower-bound of the log-likelihood while minimizing the Jensen-Shanon diver-gence between data and model distributions. We experiment our model with textgeneration task and show that it can generate realistic text with high diversity.
    @INPROCEEDINGS { hossam_etal_bdl18_textgeneration,
        AUTHOR = { Mahmoud Hossam and Trung Le and Michael Papasimeon and Viet Huynh and Dinh Phung },
        TITLE = { Text Generation with Deep Variational {GAN} },
        BOOKTITLE = { 32nd Neural Information Processing System (NIPS) Workshop on Bayesian Deep Learning },
        YEAR = { 2018 },
        ABSTRACT = { Generating realistic sequences is a central task in many machine learning appli-cations. There has been considerable recent progress on building deep generativemodels for sequence generation tasks. However, the issue of mode-collapsingremains a main issue for the current models. In this paper we propose a GAN-based generic framework to address the problem of mode-collapse in a principledapproach. We change the standard GAN objective to maximize a variationallower-bound of the log-likelihood while minimizing the Jensen-Shanon diver-gence between data and model distributions. We experiment our model with textgeneration task and show that it can generate realistic text with high diversity. },
        FILE = { :hossam_etal_bdl18_textgeneration - Text Generation with Deep Variational GAN.pdf:PDF },
    }
C
  • Batch-normalized Deep Boltzmann Machines
    Hung Vu, Tu Dinh Nguyen, Trung Le, Wei Luo and Dinh Phung. In In Proceedings of Asian Conference on Machine Learning (ACML), Beijing, China, 2018. [ | ]
    @INPROCEEDINGS { vu_etal_acml18_batchnormalized,
        AUTHOR = { Hung Vu and Tu Dinh Nguyen and Trung Le and Wei Luo and Dinh Phung },
        TITLE = { Batch-normalized Deep {Boltzmann} Machines },
        BOOKTITLE = { In Proceedings of Asian Conference on Machine Learning (ACML) },
        YEAR = { 2018 },
        ADDRESS = { Beijing, China },
        OWNER = { hungv },
        TIMESTAMP = { 2018.03.22 },
    }
C
  • Clustering Induced Kernel Learning
    Nguyen, Khanh, Dam, Nhan, Le, Trung, Nguyen, {Tu Dinh} and Phung, Dinh. In Proc. of the 10th Asian Conference on Machine Learning (ACML), pages 129-144, 14--16 Nov 2018. [ | | pdf]
    Learning rich and expressive kernel functions is a challenging task in kernel-based supervised learning. Multiple kernel learning (MKL) approach addresses this problem by combining a mixed variety of kernels and letting the optimization solver choose the most appropriate combination. However, most of existing methods are parametric in the sense that they require a predefined list of kernels. Hence, there appears a substantial trade-off between computation and the modeling risk of not being able to explore more expressive and suitable kernel functions. Moreover, current existing approaches to combine kernels cannot exploit clustering structure carried in data, especially when data are heterogeneous. In this work, we present a new framework that leverages Bayesian nonparametric models (i.e, automatically grow kernel functions) with multiple kernel learning to develop a new framework that enjoys the nonparametric flavor in the context of multiple kernel learning. In particular, we propose \emph{Clustering Induced Kernel Learning} (CIK) method that can automatically discover clustering structure from the data and train a single kernel machine to fit data in each discovered cluster simultaneously. The outcome of our proposed method includes both clustering analysis and multiple kernel classifier for a given dataset. We conduct extensive experiments on several benchmark datasets. The experimental results show that our method can improve classification and clustering performance when datasets have complex clustering structure with different preferred kernels.
    @INPROCEEDINGS { nguyen_etal_acml18_clustering,
        AUTHOR = { Nguyen, Khanh and Dam, Nhan and Le, Trung and Nguyen, {Tu Dinh} and Phung, Dinh },
        TITLE = { Clustering Induced Kernel Learning },
        BOOKTITLE = { Proc. of the 10th Asian Conference on Machine Learning (ACML) },
        YEAR = { 2018 },
        EDITOR = { Zhu, Jun and Takeuchi, Ichiro },
        VOLUME = { 95 },
        SERIES = { Proceedings of Machine Learning Research },
        PAGES = { 129--144 },
        MONTH = { 14--16 Nov },
        PUBLISHER = { PMLR },
        ABSTRACT = { Learning rich and expressive kernel functions is a challenging task in kernel-based supervised learning. Multiple kernel learning (MKL) approach addresses this problem by combining a mixed variety of kernels and letting the optimization solver choose the most appropriate combination. However, most of existing methods are parametric in the sense that they require a predefined list of kernels. Hence, there appears a substantial trade-off between computation and the modeling risk of not being able to explore more expressive and suitable kernel functions. Moreover, current existing approaches to combine kernels cannot exploit clustering structure carried in data, especially when data are heterogeneous. In this work, we present a new framework that leverages Bayesian nonparametric models (i.e, automatically grow kernel functions) with multiple kernel learning to develop a new framework that enjoys the nonparametric flavor in the context of multiple kernel learning. In particular, we propose \emph{Clustering Induced Kernel Learning} (CIK) method that can automatically discover clustering structure from the data and train a single kernel machine to fit data in each discovered cluster simultaneously. The outcome of our proposed method includes both clustering analysis and multiple kernel classifier for a given dataset. We conduct extensive experiments on several benchmark datasets. The experimental results show that our method can improve classification and clustering performance when datasets have complex clustering structure with different preferred kernels. },
        FILE = { :nguyen_etal_acml18_clustering - Clustering Induced Kernel Learning.pdf:PDF;nguyen18a.pdf:http\://proceedings.mlr.press/v95/nguyen18a/nguyen18a.pdf:PDF },
        URL = { http://proceedings.mlr.press/v95/nguyen18a.html },
    }
C
  • LTARM: A novel temporal association rule mining method to understand toxicities in a routine cancer treatment
    Dang Nguyen, Wei Luo, Dinh Phung and Svetha Venkatesh. Knowledge-Based Systems, 2018. [ | ]
    Cancer is a worldwide problem and one of the leading causes of death. Increasing prevalence of cancer, particularly in developing countries, demands better understandings of the effectiveness and adverse consequences of different cancer treatment regimes in real patient populations. Current understandings of cancer treatment toxicities are often derived from either “clean” patient cohorts or coarse population statistics. Thus, it is difficult to get up-to-date and local assessments of treatment toxicities for specific cancer centers. To address these problems, we propose a novel and efficient method for discovering toxicity progression patterns in the form of temporal association rules (TARs). A temporal association rule is defined as a rule where the diagnosis codes in the right hand side (e.g., a combination of toxicities/complications) are temporally occurred after the diagnosis codes in the left hand side (e.g., a particular type of cancer treatment). Our method develops a lattice structure to efficiently discover TARs. More specifically, the lattice structure is first constructed to store all frequent diagnosis codes in the dataset. It is then traversed using the paternity relations among nodes to generate TARs. Our extensive experiments show the effectiveness of the proposed method in discovering major toxicity patterns in comparison with the temporal comorbidity analysis. In addition, our method significantly outperforms existing methods for mining TARs in terms of runtime.
    @ARTICLE { nguyen_kbs18_ltarm,
        AUTHOR = { Dang Nguyen and Wei Luo and Dinh Phung and Svetha Venkatesh },
        TITLE = { {LTARM}: A novel temporal association rule mining method to understand toxicities in a routine cancer treatment },
        JOURNAL = { Knowledge-Based Systems },
        YEAR = { 2018 },
        ABSTRACT = { Cancer is a worldwide problem and one of the leading causes of death. Increasing prevalence of cancer, particularly in developing countries, demands better understandings of the effectiveness and adverse consequences of different cancer treatment regimes in real patient populations. Current understandings of cancer treatment toxicities are often derived from either “clean” patient cohorts or coarse population statistics. Thus, it is difficult to get up-to-date and local assessments of treatment toxicities for specific cancer centers. To address these problems, we propose a novel and efficient method for discovering toxicity progression patterns in the form of temporal association rules (TARs). A temporal association rule is defined as a rule where the diagnosis codes in the right hand side (e.g., a combination of toxicities/complications) are temporally occurred after the diagnosis codes in the left hand side (e.g., a particular type of cancer treatment). Our method develops a lattice structure to efficiently discover TARs. More specifically, the lattice structure is first constructed to store all frequent diagnosis codes in the dataset. It is then traversed using the paternity relations among nodes to generate TARs. Our extensive experiments show the effectiveness of the proposed method in discovering major toxicity patterns in comparison with the temporal comorbidity analysis. In addition, our method significantly outperforms existing methods for mining TARs in terms of runtime. },
        DOI = { https://doi.org/10.1016/j.knosys.2018.07.031 },
        FILE = { :nguyen_kbs18_ltarm - LTARM_ a Novel Temporal Association Rule Mining Method to Understand Toxicities in a Routine Cancer Treatment.pdf:PDF },
    }
J
  • Jointly predicting affective and mental health scores using deep neural networks of visual cues on the Web
    Hung Nguyen, Van Nguyen, Thin Nguyen, Mark Larsen, Bridianne O'Dea, Duc Thanh Nguyen, Trung Le, Dinh Phung, Svetha Venkatesh and Helen Christensen. In Proc. of the Int. Conf. on Web Information Systems Engineering (WISE)Springer, , 2018. [ | ]
    Despite the range of studies examining the relationship between mental health and social media data, not all prior studies have validated the social media markers against “ground truth”, or validated psychiatric information, in general community samples. Instead, researchers have approximated psychiatric diagnosis using user statements such as “I have been diagnosed as X”. Without “ground truth”, the value of predictive algorithms is highly questionable and potentially harmful. In addition, for social media data, whilst linguistic features have been widely identified as strong markers of mental health disorders, little is known about non-textual features on their links with the disorders. The current work is a longitudinal study during which participants’ mental health data, consisting of depression and anxiety scores, were collected fortnightly with a validated, diagnostic, clinical measure. Also, datasets with labels relevant to mental health scores, such as emotional scores, are also employed to improve the performance in prediction of mental health scores. This work introduces a deep neural network-based method integrating sub-networks on predicting affective scores and mental health outcomes from images. Experimental results have shown that in the both predictions of emotion and mental health scores, (1) deep features majorly outperform handcrafted ones and (2) the proposed network achieves better performance compared with separate networks.
    @INCOLLECTION { nguyen_etal_wise18_jointly,
        AUTHOR = { Hung Nguyen and Van Nguyen and Thin Nguyen and Mark Larsen and Bridianne O'Dea and Duc Thanh Nguyen and Trung Le and Dinh Phung and Svetha Venkatesh and Helen Christensen },
        TITLE = { Jointly predicting affective and mental health scores using deep neural networks of visual cues on the Web },
        BOOKTITLE = { Proc. of the Int. Conf. on Web Information Systems Engineering (WISE) },
        PUBLISHER = { Springer },
        YEAR = { 2018 },
        SERIES = { Lecture Notes in Computer Science },
        ABSTRACT = { Despite the range of studies examining the relationship between mental health and social media data, not all prior studies have validated the social media markers against “ground truth”, or validated psychiatric information, in general community samples. Instead, researchers have approximated psychiatric diagnosis using user statements such as “I have been diagnosed as X”. Without “ground truth”, the value of predictive algorithms is highly questionable and potentially harmful. In addition, for social media data, whilst linguistic features have been widely identified as strong markers of mental health disorders, little is known about non-textual features on their links with the disorders. The current work is a longitudinal study during which participants’ mental health data, consisting of depression and anxiety scores, were collected fortnightly with a validated, diagnostic, clinical measure. Also, datasets with labels relevant to mental health scores, such as emotional scores, are also employed to improve the performance in prediction of mental health scores. This work introduces a deep neural network-based method integrating sub-networks on predicting affective scores and mental health outcomes from images. Experimental results have shown that in the both predictions of emotion and mental health scores, (1) deep features majorly outperform handcrafted ones and (2) the proposed network achieves better performance compared with separate networks. },
        FILE = { :nguyen_etal_wise18_jointly - Jointly Predicting Affective and Mental Health Scores Using Deep Neural Networks of Visual Cues on the Web.pdf:PDF },
        LANGUAGE = { English },
        OWNER = { thinng },
        TIMESTAMP = { 2017.08.28 },
    }
BC
  • Learning Graph Representation via Frequent Subgraphs
    Dang Nguyen, Wei Luo, Tu Dinh Nguyen, Svetha Venkatesh and Dinh Phung. In Proc. of SIAM Int. Conf. on Data Mining (SDM), 2018. (Student travel award). [ | ]
    @INPROCEEDINGS { nguyen_etal_sdm18_learning,
        AUTHOR = { Dang Nguyen and Wei Luo and Tu Dinh Nguyen and Svetha Venkatesh and Dinh Phung },
        TITLE = { Learning Graph Representation via Frequent Subgraphs },
        BOOKTITLE = { Proc. of SIAM Int. Conf. on Data Mining (SDM) },
        YEAR = { 2018 },
        PUBLISHER = { SIAM },
        NOTE = { Student travel award },
        FILE = { :nguyen_etal_sdm18_learning - Learning Graph Representation Via Frequent Subgraphs.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2018.01.12 },
    }
C
  • Sqn2Vec: Learning Sequence Representation via Sequential Patterns with a Gap Constraint
    Dang Nguyen, Wei Luo, Tu Dinh Nguyen, Svetha Venkatesh and Dinh Phung. In ECML-PKDD, 2018. (Runner-up Best Student Machine Leaning Paper Award). [ | ]
    When learning sequence representations, traditional pattern-based methods often suffer from the data sparsity and high-dimensionality problems while recent neural embedding methods often fail on sequential datasets with a small vocabulary. To address these disadvantages, we propose an unsupervised method (named Sqn2Vec) which first leverages sequential patterns (SPs) to increase the vocabulary size and then learns low-dimensional continuous vectors for sequences via a neural embedding model. Moreover, our method enforces a gap constraint among symbols in sequences to obtain meaningful and discriminative SPs. Consequently, Sqn2Vec produces significantly better sequence representations than a comprehensive list of state-of-the-art baselines, particularly on sequential datasets with a relatively small vocabulary. We demonstrate the superior performance of Sqn2Vec in several machine learning tasks including sequence classification, clustering, and visualization.
    @INPROCEEDINGS { nguyen_etal_ecml18_sqn2vec,
        AUTHOR = { Dang Nguyen and Wei Luo and Tu Dinh Nguyen and Svetha Venkatesh and Dinh Phung },
        TITLE = { {Sqn2Vec}: Learning Sequence Representation via Sequential Patterns with a Gap Constraint },
        BOOKTITLE = { ECML-PKDD },
        YEAR = { 2018 },
        NOTE = { Runner-up Best Student Machine Leaning Paper Award },
        ABSTRACT = { When learning sequence representations, traditional pattern-based methods often suffer from the data sparsity and high-dimensionality problems while recent neural embedding methods often fail on sequential datasets with a small vocabulary. To address these disadvantages, we propose an unsupervised method (named Sqn2Vec) which first leverages sequential patterns (SPs) to increase the vocabulary size and then learns low-dimensional continuous vectors for sequences via a neural embedding model. Moreover, our method enforces a gap constraint among symbols in sequences to obtain meaningful and discriminative SPs. Consequently, Sqn2Vec produces significantly better sequence representations than a comprehensive list of state-of-the-art baselines, particularly on sequential datasets with a relatively small vocabulary. We demonstrate the superior performance of Sqn2Vec in several machine learning tasks including sequence classification, clustering, and visualization. },
        FILE = { :nguyen_etal_ecml18_sqn2vec - Sqn2Vec_ Learning Sequence Representation Via Sequential Patterns with a Gap Constraint.pdf:PDF },
    }
C
  • A Convolutional Neural Network-based Model for Knowledge Base Completion and Its Application to Search Personalization
    Dai Quoc Nguyen, Dat Quoc Nguyen, Tu Dinh Nguyen and Dinh Phung. Semantic Web journal (SWJ), 2018. [ | | pdf]
    In this paper, we propose a novel embedding model, named ConvKB, for knowledge base completion. Our model ConvKB advances state-of-the-art models by employing a convolutional neural network, so that it can capture global relationships and transitional characteristics between entities and relations in knowledge bases. In ConvKB, each triple (head entity, relation, tail entity) is represented as a 3-column matrix where each column vector represents a triple element. This 3-column matrix is then fed to a convolution layer where multiple filters are operated on the matrix to generate different feature maps. These feature maps are then concatenated into a single feature vector representing the input triple. The feature vector is multiplied with a weight vector via a dot product to return a score. This score is then used to predict whether the triple is valid or not. Experiments show that ConvKB obtains better link prediction and triple classification results than previous state-of-the-art models on benchmark datasets WN18RR, FB15k-237, WN11 and FB13. We further apply our ConvKB to search personalization problem which aims to tailor the search results to each specific user based on the user's personal interests and preferences. In particular, we model the potential relationship between the submitted query, the user and the search result (i.e., document) as a triple \textit(query, user, document) on which the ConvKB is able to work. Experimental results on query logs from a commercial web search engine show that ConvKB achieves better performances than the standard ranker as well as up-to-date search personalization baselines.
    @ARTICLE { nguyen_etal_swj18_convolutional,
        AUTHOR = { Dai Quoc Nguyen and Dat Quoc Nguyen and Tu Dinh Nguyen and Dinh Phung },
        TITLE = { A Convolutional Neural Network-based Model for Knowledge Base Completion and Its Application to Search Personalization },
        JOURNAL = { Semantic Web journal (SWJ) },
        YEAR = { 2018 },
        ABSTRACT = { In this paper, we propose a novel embedding model, named ConvKB, for knowledge base completion. Our model ConvKB advances state-of-the-art models by employing a convolutional neural network, so that it can capture global relationships and transitional characteristics between entities and relations in knowledge bases. In ConvKB, each triple (head entity, relation, tail entity) is represented as a 3-column matrix where each column vector represents a triple element. This 3-column matrix is then fed to a convolution layer where multiple filters are operated on the matrix to generate different feature maps. These feature maps are then concatenated into a single feature vector representing the input triple. The feature vector is multiplied with a weight vector via a dot product to return a score. This score is then used to predict whether the triple is valid or not. Experiments show that ConvKB obtains better link prediction and triple classification results than previous state-of-the-art models on benchmark datasets WN18RR, FB15k-237, WN11 and FB13. We further apply our ConvKB to search personalization problem which aims to tailor the search results to each specific user based on the user's personal interests and preferences. In particular, we model the potential relationship between the submitted query, the user and the search result (i.e., document) as a triple \textit(query, user, document) on which the ConvKB is able to work. Experimental results on query logs from a commercial web search engine show that ConvKB achieves better performances than the standard ranker as well as up-to-date search personalization baselines. },
        FILE = { :nguyen_etal_swj18_convolutional - A Convolutional Neural Network Based Model for Knowledge Base Completion and Its Application to Search Personalization.pdf:PDF },
        URL = { http://www.semantic-web-journal.net/system/files/swj1867.pdf },
    }
J
  • GoGP: Scalable Geometric-based Gaussian Process for Online Regression
    Trung Le, Khanh Nguyen, Vu Nguyen, Tu Dinh Nguyen and Dinh Phung. Knowledge and Information Systems (KAIS), may 2018. [ | ]
    One of the most current challenging problems in Gaussian process regression (GPR) is to handle large-scale datasets and to accommodate an online learning setting where data arrive irregularly on the fly. In this paper, we introduce a novel online Gaussian process model that could scale with massive datasets. Our approach is formulated based on alternative representation of the Gaussian process under geometric and optimization views, hence termed geometric-based online GP (GoGP). We developed theory to guarantee that with a good convergence rate our proposed algorithm always produces a (sparse) solution which is close to the true optima to any arbitrary level of approximation accuracy specified a priori. Furthermore, our method is proven to scale seamlessly not only with large-scale datasets, but also to adapt accurately with streaming data. We extensively evaluated our proposed model against state-of-the-art baselines using several large-scale datasets for online regression task. The experimental results show that our GoGP delivered comparable, or slightly better, predictive performance while achieving a magnitude of computational speedup compared withits rivals under online setting. More importantly, its convergence behavior is guaranteed through our theoretical analysis, which is rapid and stable while achieving lower errors.
    @ARTICLE { le_etal_kais18_gogp,
        AUTHOR = { Trung Le and Khanh Nguyen and Vu Nguyen and Tu Dinh Nguyen and Dinh Phung },
        TITLE = { {GoGP}: Scalable Geometric-based Gaussian Process for Online Regression },
        JOURNAL = { Knowledge and Information Systems (KAIS) },
        YEAR = { 2018 },
        MONTH = { may },
        ABSTRACT = { One of the most current challenging problems in Gaussian process regression (GPR) is to handle large-scale datasets and to accommodate an online learning setting where data arrive irregularly on the fly. In this paper, we introduce a novel online Gaussian process model that could scale with massive datasets. Our approach is formulated based on alternative representation of the Gaussian process under geometric and optimization views, hence termed geometric-based online GP (GoGP). We developed theory to guarantee that with a good convergence rate our proposed algorithm always produces a (sparse) solution which is close to the true optima to any arbitrary level of approximation accuracy specified a priori. Furthermore, our method is proven to scale seamlessly not only with large-scale datasets, but also to adapt accurately with streaming data. We extensively evaluated our proposed model against state-of-the-art baselines using several large-scale datasets for online regression task. The experimental results show that our GoGP delivered comparable, or slightly better, predictive performance while achieving a magnitude of computational speedup compared withits rivals under online setting. More importantly, its convergence behavior is guaranteed through our theoretical analysis, which is rapid and stable while achieving lower errors. },
        FILE = { :le_etal_kais18_gogp - GoGP_ Scalable Geometric Based Gaussian Process for Online Regression.pdf:PDF },
    }
J
  • Effective Identification of Similar Patients through Sequential Matching over ICD Code Embedding
    Dang Nguyen, Wei Luo, Svetha Venkatesh and Dinh Phung. Journal of Medical Systems (JMS), 42(5):94, April 2018. [ | | pdf]
    Evidence-based medicine often involves the identification of patients with similar conditions, which are often captured in ICD code sequences. With no satisfying prior solutions for matching ICD-10 code sequences, this paper presents a method which effectively captures the clinical similarity among routine patients who have multiple comorbidities and complex care needs. Our method leverages the recent progress in representation learning of individual ICD-10 codes, and it explicitly uses the sequential order of codes for matching. Empirical evaluation on a state-wide cancer data collection shows that our proposed method achieves significantly higher matching performance compared with state-of-the-art methods ignoring the sequential order. Our method better identifies similar patients in a number of clinical outcomes including readmission and mortality outlook. Although this paper focuses on ICD-10 diagnosis code sequences, our method can be adapted to work with other codified sequence data.
    @ARTICLE { nguyen_etal_jms18_effective,
        AUTHOR = { Dang Nguyen and Wei Luo and Svetha Venkatesh and Dinh Phung },
        TITLE = { Effective Identification of Similar Patients through Sequential Matching over ICD Code Embedding },
        JOURNAL = { Journal of Medical Systems (JMS) },
        YEAR = { 2018 },
        VOLUME = { 42 },
        NUMBER = { 5 },
        PAGES = { 94 },
        MONTH = { April },
        ABSTRACT = { Evidence-based medicine often involves the identification of patients with similar conditions, which are often captured in ICD code sequences. With no satisfying prior solutions for matching ICD-10 code sequences, this paper presents a method which effectively captures the clinical similarity among routine patients who have multiple comorbidities and complex care needs. Our method leverages the recent progress in representation learning of individual ICD-10 codes, and it explicitly uses the sequential order of codes for matching. Empirical evaluation on a state-wide cancer data collection shows that our proposed method achieves significantly higher matching performance compared with state-of-the-art methods ignoring the sequential order. Our method better identifies similar patients in a number of clinical outcomes including readmission and mortality outlook. Although this paper focuses on ICD-10 diagnosis code sequences, our method can be adapted to work with other codified sequence data. },
        FILE = { :nguyen_etal_jms18_effective - Effective Identification of Similar Patients through Sequential Matching Over ICD Code Embedding.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2018.03.29 },
        URL = { https://link.springer.com/article/10.1007/s10916-018-0951-4 },
    }
J
  • Bayesian Multi-Hyperplane Machine for Pattern Recognition
    Khanh Nguyen, Trung Le, Tu Nguyen and Dinh Phung. In Proc. of the 24th International Conference on Pattern Recognition (ICPR), Beijing, China, aug 2018. [ | ]
    Current existing multi-hyperplane machine approach deals with high-dimensional and complex datasets by approximating the input data region using a parametric mixture of hyperplanes. Consequently, this approach requires an excessively time-consuming parameter search to find the set of optimal hyper-parameters. Another serious drawback of this approach is that it is often suboptimal since the optimal choice for the hyper-parameter is likely to lie outside the searching space due to the space discretization step required in grid search. To address these challenges, we propose in this paper BAyesian Multi-hyperplane Machine (BAMM). Our approach departs from a Bayesian perspective, and aims to construct an alternative probabilistic view in such a way that its maximuma-posteriori (MAP) estimation reduces exactly to the original optimization problem of a multi-hyperplane machine. This view allows us to endow prior distributions over hyper-parameters and augment auxiliary variables to efficiently infer model parameters and hyper-parameters via Markov chain Monte Carlo (MCMC) method. We then employ a Stochastic Gradient Descent (SGD) framework to scale our model up with ever-growing large datasets. Extensive experiments demonstrate the capability of our proposed method in learning the optimal model without using any parameter tuning, and in achieving comparable accuracies compared with the state-of-art baselines; in the meantime our model can seamlessly handle with large-scale datasets.
    @INPROCEEDINGS { nguyen_etal_icpr18_bayesian,
        AUTHOR = { Khanh Nguyen and Trung Le and Tu Nguyen and Dinh Phung },
        TITLE = { Bayesian Multi-Hyperplane Machine for Pattern Recognition },
        BOOKTITLE = { Proc. of the 24th International Conference on Pattern Recognition (ICPR) },
        YEAR = { 2018 },
        ADDRESS = { Beijing, China },
        MONTH = { aug },
        ABSTRACT = { Current existing multi-hyperplane machine approach deals with high-dimensional and complex datasets by approximating the input data region using a parametric mixture of hyperplanes. Consequently, this approach requires an excessively time-consuming parameter search to find the set of optimal hyper-parameters. Another serious drawback of this approach is that it is often suboptimal since the optimal choice for the hyper-parameter is likely to lie outside the searching space due to the space discretization step required in grid search. To address these challenges, we propose in this paper BAyesian Multi-hyperplane Machine (BAMM). Our approach departs from a Bayesian perspective, and aims to construct an alternative probabilistic view in such a way that its maximuma-posteriori (MAP) estimation reduces exactly to the original optimization problem of a multi-hyperplane machine. This view allows us to endow prior distributions over hyper-parameters and augment auxiliary variables to efficiently infer model parameters and hyper-parameters via Markov chain Monte Carlo (MCMC) method. We then employ a Stochastic Gradient Descent (SGD) framework to scale our model up with ever-growing large datasets. Extensive experiments demonstrate the capability of our proposed method in learning the optimal model without using any parameter tuning, and in achieving comparable accuracies compared with the state-of-art baselines; in the meantime our model can seamlessly handle with large-scale datasets. },
        FILE = { :nguyen_etal_icpr18_bayesian - Bayesian Multi Hyperplane Machine for Pattern Recognition.pdf:PDF },
    }
C
2017
  • Dual Discriminator Generative Adversarial Nets
    Tu Dinh Nguyen, Trung Le, Hung Vu and Dinh Phung. In Proc. of the 31st Int. Conf. on Neural Information Processing Systems (NIPS), pages 2667-2677, USA, 2017. [ | | pdf]
    We propose in this paper a novel approach to tackle the problem of mode collapse encountered in generative adversarial network (GAN). Our idea is intuitive but proven to be very effective, especially in addressing some key limitations of GAN. In essence, it combines the Kullback-Leibler (KL) and reverse KL divergences into a unified objective function, thus it exploits the complementary statistical properties from these divergences to effectively diversify the estimated density in capturing multi-modes. We term our method dual discriminator generative adversarial nets (D2GAN) which, unlike GAN, has two discriminators; and together with a generator, it also has the analogy of a minimax game, wherein a discriminator rewards high scores for samples from data distribution whilst another discriminator, conversely, favoring data from the generator, and the generator produces data to fool both two discriminators. We develop theoretical analysis to show that, given the maximal discriminators, optimizing the generator of D2GAN reduces to minimizing both KL and reverse KL divergences between data distribution and the distribution induced from the data generated by the generator, hence effectively avoiding the mode collapsing problem. We conduct extensive experiments on synthetic and real-world large-scale datasets (MNIST, CIFAR-10, STL-10, ImageNet), where we have made our best effort to compare our D2GAN with the latest state-of-the-art GAN's variants in comprehensive qualitative and quantitative evaluations. The experimental results demonstrate the competitive and superior performance of our approach in generating good quality and diverse samples over baselines, and the capability of our method to scale up to ImageNet database.
    @INPROCEEDINGS { tu_etal_nips17_d2gan,
        AUTHOR = { Tu Dinh Nguyen and Trung Le and Hung Vu and Dinh Phung },
        TITLE = { Dual Discriminator Generative Adversarial Nets },
        BOOKTITLE = { Proc. of the 31st Int. Conf. on Neural Information Processing Systems (NIPS) },
        YEAR = { 2017 },
        SERIES = { NIPS'17 },
        PAGES = { 2667--2677 },
        ADDRESS = { USA },
        PUBLISHER = { Curran Associates Inc. },
        ABSTRACT = { We propose in this paper a novel approach to tackle the problem of mode collapse encountered in generative adversarial network (GAN). Our idea is intuitive but proven to be very effective, especially in addressing some key limitations of GAN. In essence, it combines the Kullback-Leibler (KL) and reverse KL divergences into a unified objective function, thus it exploits the complementary statistical properties from these divergences to effectively diversify the estimated density in capturing multi-modes. We term our method dual discriminator generative adversarial nets (D2GAN) which, unlike GAN, has two discriminators; and together with a generator, it also has the analogy of a minimax game, wherein a discriminator rewards high scores for samples from data distribution whilst another discriminator, conversely, favoring data from the generator, and the generator produces data to fool both two discriminators. We develop theoretical analysis to show that, given the maximal discriminators, optimizing the generator of D2GAN reduces to minimizing both KL and reverse KL divergences between data distribution and the distribution induced from the data generated by the generator, hence effectively avoiding the mode collapsing problem. We conduct extensive experiments on synthetic and real-world large-scale datasets (MNIST, CIFAR-10, STL-10, ImageNet), where we have made our best effort to compare our D2GAN with the latest state-of-the-art GAN's variants in comprehensive qualitative and quantitative evaluations. The experimental results demonstrate the competitive and superior performance of our approach in generating good quality and diverse samples over baselines, and the capability of our method to scale up to ImageNet database. },
        ACMID = { 3295027 },
        FILE = { :tu_etal_nips17_d2gan - Dual Discriminator Generative Adversarial Nets.pdf:PDF },
        ISBN = { 978-1-5108-6096-4 },
        LOCATION = { Long Beach, California, USA },
        NUMPAGES = { 11 },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2017.09.06 },
        URL = { http://dl.acm.org/citation.cfm?id=3294996.3295027 },
    }
C
  • GoGP: Fast Online Regression with Gaussian Processes
    Trung Le, Khanh Nguyen, Vu Nguyen, Tu Dinh Nguyen and Dinh Phung. In International Conference on Data Mining (ICDM), 2017. [ | ]
    One of the most current challenging problems in Gaussian process regression (GPR) is to handle large-scale datasets and to accommodate an online learning setting where data arrive irregularly on the fly. In this paper, we introduce a novel online Gaussian process model that could scale with massive datasets. Our approach is formulated based on alternative representation of the Gaussian process under geometric and optimization views, hence termed geometric-based online GP (GoGP). We developed theory to guarantee that with a good convergence rate our proposed algorithm always produces a (sparse) solution which is close to the true optima to any arbitrary level of approximation accuracy specified a priori. Furthermore, our method is proven to scale seamlessly not only with large-scale datasets, but also to adapt accurately with streaming data. We extensively evaluated our proposed model against state-of-the-art baselines using several large-scale datasets for online regression task. The experimental results show that our GoGP delivered comparable, or slightly better, predictive performance while achieving a magnitude of computational speedup compared withits rivals under online setting. More importantly, its convergence behavior is guaranteed through our theoretical analysis, which is rapid and stable while achieving lower errors.
    @INPROCEEDINGS { le_etal_icdm17_gogp,
        AUTHOR = { Trung Le and Khanh Nguyen and Vu Nguyen and Tu Dinh Nguyen and Dinh Phung },
        TITLE = { {GoGP}: Fast Online Regression with Gaussian Processes },
        BOOKTITLE = { International Conference on Data Mining (ICDM) },
        YEAR = { 2017 },
        ABSTRACT = { One of the most current challenging problems in Gaussian process regression (GPR) is to handle large-scale datasets and to accommodate an online learning setting where data arrive irregularly on the fly. In this paper, we introduce a novel online Gaussian process model that could scale with massive datasets. Our approach is formulated based on alternative representation of the Gaussian process under geometric and optimization views, hence termed geometric-based online GP (GoGP). We developed theory to guarantee that with a good convergence rate our proposed algorithm always produces a (sparse) solution which is close to the true optima to any arbitrary level of approximation accuracy specified a priori. Furthermore, our method is proven to scale seamlessly not only with large-scale datasets, but also to adapt accurately with streaming data. We extensively evaluated our proposed model against state-of-the-art baselines using several large-scale datasets for online regression task. The experimental results show that our GoGP delivered comparable, or slightly better, predictive performance while achieving a magnitude of computational speedup compared withits rivals under online setting. More importantly, its convergence behavior is guaranteed through our theoretical analysis, which is rapid and stable while achieving lower errors. },
        FILE = { :le_etal_icdm17_gogp - GoGP_ Fast Online Regression with Gaussian Processes.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2017.09.01 },
    }
C
  • Supervised Restricted Boltzmann Machines
    Tu Dinh Nguyen, Dinh Phung, Viet Huynh and Trung Le. In In Proc. of the International Conference on Uncertainty in Artificial Intelligence (UAI), 2017. [ | | pdf]
    We propose in this paper the supervised re-stricted Boltzmann machine (sRBM), a unified framework which combines the versatility of RBM to simultaneously learn the data representation and to perform supervised learning (i.e., a nonlinear classifier or a nonlinear regressor). Unlike the current state-of-the-art classification formulation proposed for RBM in (Larochelle et al., 2012), our model is a hybrid probabilistic graphical model consisting of a distinguished genera-tive component for data representation and a dis-criminative component for prediction. While the work of (Larochelle et al., 2012) typically incurs no extra difficulty in inference compared with a standard RBM, our discriminative component, modeled as a directed graphical model, renders MCMC-based inference (e.g., Gibbs sampler) very slow and unpractical for use. To this end, we further develop scalable variational inference for the proposed sRBM for both classification and regression cases. Extensive experiments on realworld datasets show that our sRBM achieves better predictive performance than baseline methods. At the same time, our proposed framework yields learned representations which are more discriminative, hence interpretable, than those of its counterparts. Besides, our method is probabilistic and capable of generating meaningful data conditioning on specific classes – a topic which is of current great interest in deep learning aiming at data generation.
    @INPROCEEDINGS { nguyen_etal_uai17supervised,
        AUTHOR = { Tu Dinh Nguyen and Dinh Phung and Viet Huynh and Trung Le },
        TITLE = { Supervised Restricted Boltzmann Machines },
        BOOKTITLE = { In Proc. of the International Conference on Uncertainty in Artificial Intelligence (UAI) },
        YEAR = { 2017 },
        ABSTRACT = { We propose in this paper the supervised re-stricted Boltzmann machine (sRBM), a unified framework which combines the versatility of RBM to simultaneously learn the data representation and to perform supervised learning (i.e., a nonlinear classifier or a nonlinear regressor). Unlike the current state-of-the-art classification formulation proposed for RBM in (Larochelle et al., 2012), our model is a hybrid probabilistic graphical model consisting of a distinguished genera-tive component for data representation and a dis-criminative component for prediction. While the work of (Larochelle et al., 2012) typically incurs no extra difficulty in inference compared with a standard RBM, our discriminative component, modeled as a directed graphical model, renders MCMC-based inference (e.g., Gibbs sampler) very slow and unpractical for use. To this end, we further develop scalable variational inference for the proposed sRBM for both classification and regression cases. Extensive experiments on realworld datasets show that our sRBM achieves better predictive performance than baseline methods. At the same time, our proposed framework yields learned representations which are more discriminative, hence interpretable, than those of its counterparts. Besides, our method is probabilistic and capable of generating meaningful data conditioning on specific classes – a topic which is of current great interest in deep learning aiming at data generation. },
        FILE = { :nguyen_etal_uai17supervised - Supervised Restricted Boltzmann Machines.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2017.08.29 },
        URL = { http://auai.org/uai2017/proceedings/papers/106.pdf },
    }
C
  • Multilevel clustering via Wasserstein means
    Nhat Ho, XuanLong Nguyen, Mikhail Yurochkin, Hung Bui, Viet Huynh and Dinh Phung. In Proc. of the 34th Internaltional Conference on Machine Learning (ICML), pages 1501-1509, 2017. [ | | pdf]
    We propose a novel approach to the problem of multilevel clustering, which aims to simultaneously partition data in each group and discover grouping patterns among groups in a large hierarchically structural corpus of data. Our method involves a joint optimization formulation over several spaces of discrete probability measures, which are endowed with the Wasserstein distance metric. We propose a number of variants of this problem, which admit fast optimization algorithms, by exploiting the connection to the problem of finding Wasserstein barycenters. We also establish consistency properties enjoyed by our estimates of both local and global clusters. Finally, we present experiment results with both synthetic and real data to demonstrate the flexibility and scalability of the proposed approach.
    @INPROCEEDINGS { ho_etal_icml17multilevel,
        AUTHOR = { Nhat Ho and XuanLong Nguyen and Mikhail Yurochkin and Hung Bui and Viet Huynh and Dinh Phung },
        TITLE = { Multilevel clustering via {W}asserstein means },
        BOOKTITLE = { Proc. of the 34th Internaltional Conference on Machine Learning (ICML) },
        YEAR = { 2017 },
        VOLUME = { 70 },
        SERIES = { ICML'17 },
        PAGES = { 1501--1509 },
        PUBLISHER = { JMLR.org },
        ABSTRACT = { We propose a novel approach to the problem of multilevel clustering, which aims to simultaneously partition data in each group and discover grouping patterns among groups in a large hierarchically structural corpus of data. Our method involves a joint optimization formulation over several spaces of discrete probability measures, which are endowed with the Wasserstein distance metric. We propose a number of variants of this problem, which admit fast optimization algorithms, by exploiting the connection to the problem of finding Wasserstein barycenters. We also establish consistency properties enjoyed by our estimates of both local and global clusters. Finally, we present experiment results with both synthetic and real data to demonstrate the flexibility and scalability of the proposed approach. },
        ACMID = { 3305536 },
        FILE = { :ho_etal_icml17multilevel - Multilevel Clustering Via Wasserstein Means.pdf:PDF },
        LOCATION = { Sydney, NSW, Australia },
        NUMPAGES = { 9 },
        URL = { http://dl.acm.org/citation.cfm?id=3305381.3305536 },
    }
C
  • Approximation Vector Machines for Large-scale Online Learning
    Trung Le, Tu Dinh Nguyen, Vu Nguyen and Dinh Q. Phung. Journal of Machine Learning Research (JMLR), 2017. [ | | pdf]
    One of the most challenging problems in kernel online learning is to bound the model size and to promote the model sparsity. Sparse models not only improve computation and memory usage, but also enhance the generalization capacity, a principle that concurs with the law of parsimony. However, inappropriate sparsity modeling may also significantly degrade the performance. In this paper, we propose Approximation Vector Machine (AVM), a model that can simultaneously encourage the sparsity and safeguard its risk in compromising the performance. When an incoming instance arrives, we approximate this instance by one of its neighbors whose distance to it is less than a predefined threshold. Our key intuition is that since the newly seen instance is expressed by its nearby neighbor the optimal performance can be analytically formulated and maintained. We develop theoretical foundations to support this intuition and further establish an analysis to characterize the gap between the approximation and optimal solutions. This gap crucially depends on the frequency of approximation and the predefined threshold. We perform the convergence analysis for a wide spectrum of loss functions including Hinge, smooth Hinge, and Logistic for classification task, and l1, l2, and ϵ-insensitive for regression task. We conducted extensive experiments for classification task in batch and online modes, and regression task in online mode over several benchmark datasets. The results show that our proposed AVM achieved a comparable predictive performance with current state-of-the-art methods while simultaneously achieving significant computational speed-up due to the ability of the proposed AVM in maintaining the model size.
    @ARTICLE { le_etal_jmlr17approximation,
        AUTHOR = { Trung Le and Tu Dinh Nguyen and Vu Nguyen and Dinh Q. Phung },
        TITLE = { Approximation Vector Machines for Large-scale Online Learning },
        JOURNAL = { Journal of Machine Learning Research (JMLR) },
        YEAR = { 2017 },
        ABSTRACT = { One of the most challenging problems in kernel online learning is to bound the model size and to promote the model sparsity. Sparse models not only improve computation and memory usage, but also enhance the generalization capacity, a principle that concurs with the law of parsimony. However, inappropriate sparsity modeling may also significantly degrade the performance. In this paper, we propose Approximation Vector Machine (AVM), a model that can simultaneously encourage the sparsity and safeguard its risk in compromising the performance. When an incoming instance arrives, we approximate this instance by one of its neighbors whose distance to it is less than a predefined threshold. Our key intuition is that since the newly seen instance is expressed by its nearby neighbor the optimal performance can be analytically formulated and maintained. We develop theoretical foundations to support this intuition and further establish an analysis to characterize the gap between the approximation and optimal solutions. This gap crucially depends on the frequency of approximation and the predefined threshold. We perform the convergence analysis for a wide spectrum of loss functions including Hinge, smooth Hinge, and Logistic for classification task, and l1, l2, and ϵ-insensitive for regression task. We conducted extensive experiments for classification task in batch and online modes, and regression task in online mode over several benchmark datasets. The results show that our proposed AVM achieved a comparable predictive performance with current state-of-the-art methods while simultaneously achieving significant computational speed-up due to the ability of the proposed AVM in maintaining the model size. },
        FILE = { :le_etal_jmlr17approximation - Approximation Vector Machines for Large Scale Online Learning.pdf:PDF },
        KEYWORDS = { kernel, online learning, large-scale machine learning, sparsity, big data, core set, stochastic gradient descent, convergence analysis },
        URL = { https://arxiv.org/abs/1604.06518 },
    }
J
  • Discriminative Bayesian Nonparametric Clustering
    Vu Nguyen, Dinh Phung, Trung Le, Svetha Venkatesh and Hung Bui. In Proc. of International Joint Conference on Artificial Intelligence (IJCAI), 2017. [ | | pdf]
    We propose a general framework for discriminative Bayesian nonparametric clustering to promote the inter-discrimination among the learned clusters in a fully Bayesian nonparametric (BNP) manner. Our method combines existing BNP clustering and discriminative models by enforcing latent cluster indices to be consistent with the predicted labels resulted from probabilistic discriminative model. This formulation results in a well-defined generative process wherein we can use either logistic regression or SVM for discrimination. Using the proposed framework, we develop two novel discriminative BNP variants: the discriminative Dirichlet process mixtures, and the discriminative-state infinite HMMs for sequential data. We develop efficient data-augmentation Gibbs samplers for posterior inference. Extensive experiments in image clustering and dynamic location clustering demonstrate that by encouraging discrimination between induced clusters, our model enhances the quality of clustering in comparison with the traditional generative BNP models.
    @INPROCEEDINGS { nguyen_etal_ijcai17discriminative,
        AUTHOR = { Vu Nguyen and Dinh Phung and Trung Le and Svetha Venkatesh and Hung Bui },
        TITLE = { Discriminative Bayesian Nonparametric Clustering },
        BOOKTITLE = { Proc. of International Joint Conference on Artificial Intelligence (IJCAI) },
        YEAR = { 2017 },
        ABSTRACT = { We propose a general framework for discriminative Bayesian nonparametric clustering to promote the inter-discrimination among the learned clusters in a fully Bayesian nonparametric (BNP) manner. Our method combines existing BNP clustering and discriminative models by enforcing latent cluster indices to be consistent with the predicted labels resulted from probabilistic discriminative model. This formulation results in a well-defined generative process wherein we can use either logistic regression or SVM for discrimination. Using the proposed framework, we develop two novel discriminative BNP variants: the discriminative Dirichlet process mixtures, and the discriminative-state infinite HMMs for sequential data. We develop efficient data-augmentation Gibbs samplers for posterior inference. Extensive experiments in image clustering and dynamic location clustering demonstrate that by encouraging discrimination between induced clusters, our model enhances the quality of clustering in comparison with the traditional generative BNP models. },
        FILE = { :nguyen_etal_ijcai17discriminative - Discriminative Bayesian Nonparametric Clustering.pdf:PDF },
        URL = { https://www.ijcai.org/proceedings/2017/355 },
    }
C
  • Large-scale Online Kernel Learning with Random Feature Reparameterization
    Tu Dinh Nguyen, Trung Le, Hung Bui and Dinh Phung. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI), 2017. [ | | pdf]
    A typical online kernel learning method faces two fundamental issues: the complexity in dealing with a huge number of observed data points (a.k.a the curse of kernelization) and the difficulty in learning kernel parameters, which often assumed to be fixed. Random Fourier feature is a recent and effective approach to address the former by approximating the shift-invariant kernel function via Bocher’s theorem, and allows the model to be maintained directly in the random feature space with a fixed dimension, hence the model size remains constant w.r.t. data size. We further introduce in this paper the reparameterized random feature (RRF), a random feature framework for large-scale online kernel learning to address both aforementioned challenges. Our initial intuition comes from the so-called ‘reparameterization trick’ [Kingma and Welling, 2014] to lift the source of randomness of Fourier components to another space which can be independently sampled, so that stochastic gradient of the kernel parameters can be analytically derived. We develop a well-founded underlying theory for our method, including a general way to reparameterize the kernel, and a new tighter error bound on the approximation quality. This view further inspires a direct application of stochastic gradient descent for updating our model under an online learning setting. We then conducted extensive experiments on several large-scale datasets where we demonstrate that our work achieves state-of-the-art performance in both learning efficacy and efficiency.
    @INPROCEEDINGS { tu_etal_ijcai17_rrf,
        AUTHOR = { Tu Dinh Nguyen and Trung Le and Hung Bui and Dinh Phung },
        BOOKTITLE = { Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI) },
        TITLE = { Large-scale Online Kernel Learning with Random Feature Reparameterization },
        YEAR = { 2017 },
        SERIES = { IJCAI'17 },
        ABSTRACT = { A typical online kernel learning method faces two fundamental issues: the complexity in dealing with a huge number of observed data points (a.k.a the curse of kernelization) and the difficulty in learning kernel parameters, which often assumed to be fixed. Random Fourier feature is a recent and effective approach to address the former by approximating the shift-invariant kernel function via Bocher’s theorem, and allows the model to be maintained directly in the random feature space with a fixed dimension, hence the model size remains constant w.r.t. data size. We further introduce in this paper the reparameterized random feature (RRF), a random feature framework for large-scale online kernel learning to address both aforementioned challenges. Our initial intuition comes from the so-called ‘reparameterization trick’ [Kingma and Welling, 2014] to lift the source of randomness of Fourier components to another space which can be independently sampled, so that stochastic gradient of the kernel parameters can be analytically derived. We develop a well-founded underlying theory for our method, including a general way to reparameterize the kernel, and a new tighter error bound on the approximation quality. This view further inspires a direct application of stochastic gradient descent for updating our model under an online learning setting. We then conducted extensive experiments on several large-scale datasets where we demonstrate that our work achieves state-of-the-art performance in both learning efficacy and efficiency. },
        FILE = { :tu_etal_ijcai17_rrf - Large Scale Online Kernel Learning with Random Feature Reparameterization.pdf:PDF },
        LOCATION = { Melbourne, Australia },
        NUMPAGES = { 7 },
        URL = { https://www.ijcai.org/proceedings/2017/354 },
    }
C
  • Column Networks for Collective Classification
    Pham, Trang, Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. In The Thirty-First AAAI Conference on Artificial Intelligence (AAAI), 2017. [ | | pdf]
    Relational learning deals with data that are characterized by relational structures. An important task is collective classification, which is to jointly classify networked objects. While it holds a great promise to produce a better accuracy than non-collective classifiers, collective classification is computational challenging and has not leveraged on the recent breakthroughs of deep learning. We present Column Network (CLN), a novel deep learning model for collective classification in multi-relational domains. CLN has many desirable theoretical properties: (i) it encodes multi-relations between any two instances; (ii) it is deep and compact, allowing complex functions to be approximated at the network level with a small set of free parameters; (iii) local and relational features are learned simultaneously; (iv) long-range, higher-order dependencies between instances are supported naturally; and (v) crucially, learning and inference are efficient, linear in the size of the network and the number of relations. We evaluate CLN on multiple real-world applications: (a) delay prediction in software projects, (b) PubMed Diabetes publication classification and (c) film genre classification. In all applications, CLN demonstrates a higher accuracy than state-of-the-art rivals.
    @CONFERENCE { pham_etal_aaai17column,
        AUTHOR = { Pham, Trang and Tran, Truyen and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { Column Networks for Collective Classification },
        BOOKTITLE = { The Thirty-First AAAI Conference on Artificial Intelligence (AAAI) },
        YEAR = { 2017 },
        ABSTRACT = { Relational learning deals with data that are characterized by relational structures. An important task is collective classification, which is to jointly classify networked objects. While it holds a great promise to produce a better accuracy than non-collective classifiers, collective classification is computational challenging and has not leveraged on the recent breakthroughs of deep learning. We present Column Network (CLN), a novel deep learning model for collective classification in multi-relational domains. CLN has many desirable theoretical properties: (i) it encodes multi-relations between any two instances; (ii) it is deep and compact, allowing complex functions to be approximated at the network level with a small set of free parameters; (iii) local and relational features are learned simultaneously; (iv) long-range, higher-order dependencies between instances are supported naturally; and (v) crucially, learning and inference are efficient, linear in the size of the network and the number of relations. We evaluate CLN on multiple real-world applications: (a) delay prediction in software projects, (b) PubMed Diabetes publication classification and (c) film genre classification. In all applications, CLN demonstrates a higher accuracy than state-of-the-art rivals. },
        COMMENT = { Accepted },
        FILE = { :pham_etal_aaai17column - Column Networks for Collective Classification.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.11.14 },
        URL = { https://arxiv.org/abs/1609.04508 },
    }
C
  • Forward-Backward Smoothing for Hidden Markov Models of Point Pattern Data
    Nhan Dam, Dinh Phung, Ba-Ngu Vo and Viet Huynh. In 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pages 252-261, Tokyo, Japan, October 2017. [ | ]
    @INPROCEEDINGS { dam_etal_dsaa17forward,
        TITLE = { Forward-Backward Smoothing for Hidden {M}arkov Models of Point Pattern Data },
        AUTHOR = { Nhan Dam and Dinh Phung and Ba-Ngu Vo and Viet Huynh },
        BOOKTITLE = { 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA) },
        MONTH = { October },
        YEAR = { 2017 },
        PAGES = { 252-261 },
        ADDRESS = { Tokyo, Japan },
        FILE = { :dam_etal_dsaa17forward - Forward Backward Smoothing for Hidden Markov Models of Point Pattern Data.pdf:PDF },
        OWNER = { ndam },
        TIMESTAMP = { 2017.08.28 },
    }
C
  • Animal Recognition and Identification with Deep Convolutional Neural Networks for Automated Wildlife Monitoring
    Hung Nguyen, Sarah J. Maclagan, Tu Dinh Nguyen, Thin Nguyen, Paul Flemons, Kylie Andrews, Euan G. Ritchie and Dinh Phung. In Proceedings of the IEEE International Conference on Data Science and Advanced Analytics (DSAA), 2017. (Honorable Mention Application Paper). [ | ]
    Efficient and reliable monitoring of wild animals in their natural habitats is essential to inform conservation and management decisions. Automatic covert cameras or “camera traps” are being an increasingly popular tool for wildlife monitoring due to their effectiveness and reliability in collecting data of wildlife unobtrusively, continuously and in large volume. However, processing such a large volume of images and videos captured from camera traps manually is extremely expensive, time-consuming and also monotonous. This presents a major obstacle to scientists and ecologists to monitor wildlife in an open environment. Leveraging on recent advances in deep learning techniques in computer vision, we propose in this paper a framework to build automated animal recognition in the wild, aiming at an automated wildlife monitoring system. In particular, we use a single-labeled dataset from Wildlife Spotter project, done by citizen scientists, and the state-of-the-art deep convolutional neural network architectures, to train a computational system capable of filtering animal images and identifying species automatically. Our experimental results achieved an accuracy at 96.6% for the task of detecting images containing animal, and 90.4% for identifying the three most common species among the set of images of wild animals taken in South-central Victoria, Australia, demonstrating the feasibility of building fully automated wildlife observation. This, in turn, can therefore speed up research findings, construct more efficient citizen sciencebased monitoring systems and subsequent management decisions, having the potential to make significant impacts to the world of ecology and trap camera images analysis.
    @INPROCEEDINGS { hung_etal_dsaa17animal,
        AUTHOR = { Hung Nguyen and Sarah J. Maclagan and Tu Dinh Nguyen and Thin Nguyen and Paul Flemons and Kylie Andrews and Euan G. Ritchie and Dinh Phung },
        TITLE = { Animal Recognition and Identification with Deep Convolutional Neural Networks for Automated Wildlife Monitoring },
        BOOKTITLE = { Proceedings of the IEEE International Conference on Data Science and Advanced Analytics (DSAA) },
        YEAR = { 2017 },
        NOTE = { Honorable Mention Application Paper },
        ABSTRACT = { Efficient and reliable monitoring of wild animals in their natural habitats is essential to inform conservation and management decisions. Automatic covert cameras or “camera traps” are being an increasingly popular tool for wildlife monitoring due to their effectiveness and reliability in collecting data of wildlife unobtrusively, continuously and in large volume. However, processing such a large volume of images and videos captured from camera traps manually is extremely expensive, time-consuming and also monotonous. This presents a major obstacle to scientists and ecologists to monitor wildlife in an open environment. Leveraging on recent advances in deep learning techniques in computer vision, we propose in this paper a framework to build automated animal recognition in the wild, aiming at an automated wildlife monitoring system. In particular, we use a single-labeled dataset from Wildlife Spotter project, done by citizen scientists, and the state-of-the-art deep convolutional neural network architectures, to train a computational system capable of filtering animal images and identifying species automatically. Our experimental results achieved an accuracy at 96.6% for the task of detecting images containing animal, and 90.4% for identifying the three most common species among the set of images of wild animals taken in South-central Victoria, Australia, demonstrating the feasibility of building fully automated wildlife observation. This, in turn, can therefore speed up research findings, construct more efficient citizen sciencebased monitoring systems and subsequent management decisions, having the potential to make significant impacts to the world of ecology and trap camera images analysis. },
        FILE = { :hung_etal_dsaa17animal - Animal Recognition and Identification with Deep Convolutional Neural Networks for Automated Wildlife Monitoring.pdf:PDF },
        OWNER = { hung },
        TIMESTAMP = { 2017.08.28 },
    }
C
  • Prediction of Population Health Indices from Social Media using Kernel-based Textual and Temporal Features
    Nguyen, Thin, Nguyen, Duc Thanh, Larsen, Mark E., O'Dea, Bridianne, Yearwood, John, Phung, Dinh, Venkatesh, Svetha and Christensen, Helen. In Proceedings of the International Conference on World Wide Web (WWW), 2017. [ | | pdf]
    From 1984, the US has annually conducted the Behavioral Risk Factor Surveillance System (BRFSS) surveys to capture either health behaviors, such as drinking or smoking, or health outcomes, including mental, physical, and generic health, of the population. Although this kind of information at a population level, such as US counties, is important for local governments to identify local needs, traditional datasets may take years to collate and to become publicly available. Geocoded social media data can provide an alternative reflection of local health trends. In this work, to predict the percentage of adults in a county reporting “insufficient sleep”, a health behavior, and, at the same time, their health outcomes, novel textual and temporal features are proposed. The proposed textual features are defined at mid-level and can be applied on top of various low-level textual features. They are computed via kernel functions on underlying features and encode the relationships between individual underlying features over a population. To further enrich the predictive ability of the health indices, the textual features are augmented with temporal information. We evaluated the proposed features and compared them with existing features using a dataset collected from the BRFSS. Experimental results show that the combination of kernel-based textual features and temporal information predict well both the health behavior (with best performance at rho=0.82) and health outcomes (with best performance at rho=0.78), demonstrating the capability of social media data in prediction of population health indices. The results also show that our proposed features gained higher correlation coefficients than did the existing ones, increasing the correlation coefficient by up to 0.16, suggesting the potential of the approach in a wide spectrum of applications on data analytics at population levels.
    @INPROCEEDINGS { nguyen_etal_www17prediction,
        AUTHOR = { Nguyen, Thin and Nguyen, Duc Thanh and Larsen, Mark E. and O'Dea, Bridianne and Yearwood, John and Phung, Dinh and Venkatesh, Svetha and Christensen, Helen },
        TITLE = { Prediction of Population Health Indices from Social Media using Kernel-based Textual and Temporal Features },
        BOOKTITLE = { Proceedings of the International Conference on World Wide Web (WWW) },
        YEAR = { 2017 },
        ABSTRACT = { From 1984, the US has annually conducted the Behavioral Risk Factor Surveillance System (BRFSS) surveys to capture either health behaviors, such as drinking or smoking, or health outcomes, including mental, physical, and generic health, of the population. Although this kind of information at a population level, such as US counties, is important for local governments to identify local needs, traditional datasets may take years to collate and to become publicly available. Geocoded social media data can provide an alternative reflection of local health trends. In this work, to predict the percentage of adults in a county reporting “insufficient sleep”, a health behavior, and, at the same time, their health outcomes, novel textual and temporal features are proposed. The proposed textual features are defined at mid-level and can be applied on top of various low-level textual features. They are computed via kernel functions on underlying features and encode the relationships between individual underlying features over a population. To further enrich the predictive ability of the health indices, the textual features are augmented with temporal information. We evaluated the proposed features and compared them with existing features using a dataset collected from the BRFSS. Experimental results show that the combination of kernel-based textual features and temporal information predict well both the health behavior (with best performance at rho=0.82) and health outcomes (with best performance at rho=0.78), demonstrating the capability of social media data in prediction of population health indices. The results also show that our proposed features gained higher correlation coefficients than did the existing ones, increasing the correlation coefficient by up to 0.16, suggesting the potential of the approach in a wide spectrum of applications on data analytics at population levels. },
        FILE = { :nguyen_etal_www17prediction - Prediction of Population Health Indices from Social Media Using Kernel Based Textual and Temporal Features.pdf:PDF },
        OWNER = { thinng },
        TIMESTAMP = { 2017.03.25 },
        URL = { http://dl.acm.org/citation.cfm?id=3054136 },
    }
C
  • Latent Sentiment Topic Modelling and Nonparametric Discovery of Online Mental Health-related Communities
    Bo Dao, Thin Nguyen, Svetha Venkatesh and Dinh Phung. International Journal of Data Science and Analytics, 4:209–-231, November 2017. [ | | pdf]
    Social media are an online means of interaction among individuals. People are increasingly using social media, especially online communities, to discuss health concerns and seek support. Understanding topics, sentiment, and structures of these communities informs important aspects of health-related conditions. There has been growing research interest in analyzing online mental health communities; however analysis of these communities with health concerns has been limited. This paper investigate and identify latent meta-groups of online communities with and without mental health-related conditions including depression and autism. Large datasets from online communities were crawled. We analyse both sentiment-based, psycholinguistic-based and topic-based features from blog posts made by members of these online communities. The work focuses on using nonparametric methods to infer latent topics automatically from the corpus of affective words in the blog posts. The visualization of the discovered meta-communities in their use of latent topics shows a difference between the groups. This presents evidence of the emotion-bearing difference in online mental health-related communities, suggesting a possible angle for support and intervention. The methodology might offer potential machine learning techniques for research and practice in psychiatry.
    @ARTICLE { Dao_etal_17Latent,
        AUTHOR = { Bo Dao and Thin Nguyen and Svetha Venkatesh and Dinh Phung },
        TITLE = { Latent Sentiment Topic Modelling and Nonparametric Discovery of Online Mental Health-related Communities },
        JOURNAL = { International Journal of Data Science and Analytics },
        YEAR = { 2017 },
        VOLUME = { 4 },
        PAGES = { 209–-231 },
        MONTH = { November },
        ABSTRACT = { Social media are an online means of interaction among individuals. People are increasingly using social media, especially online communities, to discuss health concerns and seek support. Understanding topics, sentiment, and structures of these communities informs important aspects of health-related conditions. There has been growing research interest in analyzing online mental health communities; however analysis of these communities with health concerns has been limited. This paper investigate and identify latent meta-groups of online communities with and without mental health-related conditions including depression and autism. Large datasets from online communities were crawled. We analyse both sentiment-based, psycholinguistic-based and topic-based features from blog posts made by members of these online communities. The work focuses on using nonparametric methods to infer latent topics automatically from the corpus of affective words in the blog posts. The visualization of the discovered meta-communities in their use of latent topics shows a difference between the groups. This presents evidence of the emotion-bearing difference in online mental health-related communities, suggesting a possible angle for support and intervention. The methodology might offer potential machine learning techniques for research and practice in psychiatry. },
        FILE = { :Dao_etal_17Latent - Latent Sentiment Topic Modelling and Nonparametric Discovery of Online Mental Health Related Communities.pdf:PDF },
        OWNER = { thinng },
        TIMESTAMP = { 2017.08.31 },
        URL = { https://link.springer.com/article/10.1007/s41060-017-0073-y },
    }
J

Invalid BibTex Entry!

  • Estimating support scores of autism communities in large-scale Web information systems
    Thin Nguyen, Hung Nguyen, Svetha Venkatesh and Dinh Phung. In Proceedings of the International Conference on Web Information Systems Engineering (WISE)Springer, , 2017. [ | ]
    Individuals with Autism Spectrum Disorder (ASD) have been shown to prefer communication at a socio-spatial distance. So while rarely found in the real world, autism communities are popular in Web-based forums, convenient for people with ASD to seek and share health related information. Reddit is one such avenue for people of common interest to connect, forming communities of specific interest, namely subreddits. This work aims to estimate support scores provided by a popular subreddit interested in ASD – www.reddit.com/r/aspergers. The scores were measured in both the quantities and qualities of the conversations in the forum, including conversational involvement, emotional, and informational support. The support scores of the subreddit Aspergers was compared with that of an average subreddit derived from entire Reddit, represented by two big corpora of approximately 200 million Reddit posts and 1.66 billion Reddit comments. The ASD subreddit was found to be a supportive community, having far higher support scores than did the average subreddit. Apache Spark, an advanced cluster computing framework, is employed to speed up processing of the large corpora. Scalable machine learning techniques implemented in Spark help discriminate the content made in Aspergers versus other subreddits and automatically discover linguistic predictors of ASD within minutes, providing timely reports.
    @INCOLLECTION { Nguyen_etal_17Estimating,
        AUTHOR = { Thin Nguyen and Hung Nguyen and Svetha Venkatesh and Dinh Phung },
        TITLE = { Estimating support scores of autism communities in large-scale Web information systems },
        BOOKTITLE = { Proceedings of the International Conference on Web Information Systems Engineering (WISE) },
        PUBLISHER = { Springer },
        YEAR = { 2017 },
        SERIES = { Lecture Notes in Computer Science },
        ABSTRACT = { Individuals with Autism Spectrum Disorder (ASD) have been shown to prefer communication at a socio-spatial distance. So while rarely found in the real world, autism communities are popular in Web-based forums, convenient for people with ASD to seek and share health related information. Reddit is one such avenue for people of common interest to connect, forming communities of specific interest, namely subreddits. This work aims to estimate support scores provided by a popular subreddit interested in ASD – www.reddit.com/r/aspergers. The scores were measured in both the quantities and qualities of the conversations in the forum, including conversational involvement, emotional, and informational support. The support scores of the subreddit Aspergers was compared with that of an average subreddit derived from entire Reddit, represented by two big corpora of approximately 200 million Reddit posts and 1.66 billion Reddit comments. The ASD subreddit was found to be a supportive community, having far higher support scores than did the average subreddit. Apache Spark, an advanced cluster computing framework, is employed to speed up processing of the large corpora. Scalable machine learning techniques implemented in Spark help discriminate the content made in Aspergers versus other subreddits and automatically discover linguistic predictors of ASD within minutes, providing timely reports. },
        FILE = { :Nguyen_etal_17Estimating - Estimating Support Scores of Autism Communities in Large Scale Web Information Systems.pdf:PDF },
        LANGUAGE = { English },
        OWNER = { thinng },
        TIMESTAMP = { 2017.08.28 },
    }
BC
  • Kernel-based features for predicting population health indices from geocoded social media data
    Thin Nguyen, Mark E. Larsen, Bridianne O'Dea, Duc Thanh Nguyen, John Yearwood, Dinh Phung, Svetha Venkatesh and Helen Christensen. Decision Support Systems, 2017. [ | | pdf]
    When using tweets to predict population health index, due to the large scale of data, an aggregation of tweets by population has been a popular practice in learning features to characterize the population. This would alleviate the computational cost for extracting features on each individual tweet. On the other hand, much information on the population could be lost as the distribution of textual features of a population could be important for identifying the health index of that population. In addition, there could be relationships between features and those relationships could also convey predictive information of the health index. In this paper, we propose mid-level features namely kernel-based features for prediction of health indices of populations from social media data. The kernel-based features are extracted on the distributions of textual features over population tweets and encode the relationships between individual textual features in a kernel function. We implemented our features using three different kernel functions and applied them for two case studies of population health prediction: across-year prediction and across-county prediction. The kernel-based features were evaluated and compared with existing features on a dataset collected from the Behavioral Risk Factor Surveillance System dataset. Experimental results show that the kernel-based features gained significantly higher prediction performance than existing techniques, by up to 16.3%, suggesting the potential and applicability of the proposed features in a wide spectrum of applications on data analytics at population levels.
    @ARTICLE { Nguyen_etal_17Kernel,
        AUTHOR = { Thin Nguyen and Mark E. Larsen and Bridianne O'Dea and Duc Thanh Nguyen and John Yearwood and Dinh Phung and Svetha Venkatesh and Helen Christensen },
        TITLE = { Kernel-based features for predicting population health indices from geocoded social media data },
        JOURNAL = { Decision Support Systems },
        YEAR = { 2017 },
        VOLUME = { 0 },
        NUMBER = { 0 },
        PAGES = { 1-34 },
        ABSTRACT = { When using tweets to predict population health index, due to the large scale of data, an aggregation of tweets by population has been a popular practice in learning features to characterize the population. This would alleviate the computational cost for extracting features on each individual tweet. On the other hand, much information on the population could be lost as the distribution of textual features of a population could be important for identifying the health index of that population. In addition, there could be relationships between features and those relationships could also convey predictive information of the health index. In this paper, we propose mid-level features namely kernel-based features for prediction of health indices of populations from social media data. The kernel-based features are extracted on the distributions of textual features over population tweets and encode the relationships between individual textual features in a kernel function. We implemented our features using three different kernel functions and applied them for two case studies of population health prediction: across-year prediction and across-county prediction. The kernel-based features were evaluated and compared with existing features on a dataset collected from the Behavioral Risk Factor Surveillance System dataset. Experimental results show that the kernel-based features gained significantly higher prediction performance than existing techniques, by up to 16.3%, suggesting the potential and applicability of the proposed features in a wide spectrum of applications on data analytics at population levels. },
        FILE = { :Nguyen_etal_17Kernel - Kernel Based Features for Predicting Population Health Indices from Geocoded Social Media Data.pdf:PDF },
        OWNER = { thinng },
        TIMESTAMP = { 2017.07.01 },
        URL = { http://www.sciencedirect.com/science/article/pii/S0167923617301227 },
    }
J
  • Estimation of the prevalence of adverse drug reactions from social media
    Thin Nguyen, Mark Larsen, Bridianne O'Dea, Dinh Phung, Svetha Venkatesh and Helen Christensen. International Journal of Medical Informatics (IJMI), 2017. [ | | pdf]
    This work aims to estimate the degree of adverse drug reactions (ADR) for psychiatric medications from social media, including Twitter, Reddit, and LiveJournal. Advances in lightning-fast cluster computing was employed to process large scale data, consisting of 6.4 terabytes of data containing 3.8 billion records from all the media. Rates of ADR were quantified using the SIDER database of drugs and side-effects, and an estimated ADR rate was based on the prevalence of discussion in the social media corpora. Agreement between these measures for a sample of ten popular psychiatric drugs was evaluated using the Pearson correlation coefficient, r, with values between 0.08 and 0.50. Word2vec, a novel neural learning framework, was utilized to improve the coverage of variants of ADR terms in the unstructured text by identifying syntactically or semantically similar terms. Improved correlation coefficients, between 0.29 and 0.59, demonstrates the capability of advanced techniques in machine learning to aid in the discovery of meaningful patterns from medical data, and social media data, at scale.
    @ARTICLE { nguyen_etal_jmi17estimation,
        AUTHOR = { Thin Nguyen and Mark Larsen and Bridianne O'Dea and Dinh Phung and Svetha Venkatesh and Helen Christensen },
        TITLE = { Estimation of the prevalence of adverse drug reactions from social media },
        JOURNAL = { International Journal of Medical Informatics (IJMI) },
        YEAR = { 2017 },
        PAGES = { 1--17 },
        ABSTRACT = { This work aims to estimate the degree of adverse drug reactions (ADR) for psychiatric medications from social media, including Twitter, Reddit, and LiveJournal. Advances in lightning-fast cluster computing was employed to process large scale data, consisting of 6.4 terabytes of data containing 3.8 billion records from all the media. Rates of ADR were quantified using the SIDER database of drugs and side-effects, and an estimated ADR rate was based on the prevalence of discussion in the social media corpora. Agreement between these measures for a sample of ten popular psychiatric drugs was evaluated using the Pearson correlation coefficient, r, with values between 0.08 and 0.50. Word2vec, a novel neural learning framework, was utilized to improve the coverage of variants of ADR terms in the unstructured text by identifying syntactically or semantically similar terms. Improved correlation coefficients, between 0.29 and 0.59, demonstrates the capability of advanced techniques in machine learning to aid in the discovery of meaningful patterns from medical data, and social media data, at scale. },
        FILE = { :nguyen_etal_jmi17estimation - Estimation of the Prevalence of Adverse Drug Reactions from Social Media.pdf:PDF },
        URL = { http://www.sciencedirect.com/science/article/pii/S1386505617300746 },
    }
J

Invalid BibTex Entry!

  • Hierarchical semi-Markov conditional random fields for deep recursive sequential data
    Truyen Tran, Dinh Phung, Hung H. Bui and Svetha Venkatesh. Artificial Intelligence (AIJ), Feb. 2017. [ | | pdf]
    We present the hierarchical semi-Markov conditional random field (HSCRF), a generalisation of linear-chain conditional random fields to model deep nested Markov processes. It is parameterised in a discriminative framework and has polynomial time algorithms for learning and inference. Importantly, we consider partially-supervised learning and propose algorithms for generalised partially-supervised learning and constrained inference. We develop numerical scaling procedures that handle the overflow problem. We show that the HSCRF can be reduced to the semi-Markov conditional random fields. Finally, we demonstrate the HSCRF in two applications: (i) recognising human activities of daily living (ADLs) from indoor surveillance cameras, and (ii) noun-phrase chunking. The HSCRF is capable of learning rich hierarchical models with reasonable accuracy in both fully and partially observed data cases.
    @ARTICLE { tran_etal_aij17hierarchical,
        AUTHOR = { Truyen Tran and Dinh Phung and Hung H. Bui and Svetha Venkatesh },
        TITLE = { Hierarchical semi-Markov conditional random fields for deep recursive sequential data },
        JOURNAL = { Artificial Intelligence (AIJ) },
        YEAR = { 2017 },
        MONTH = { Feb. },
        ABSTRACT = { We present the hierarchical semi-Markov conditional random field (HSCRF), a generalisation of linear-chain conditional random fields to model deep nested Markov processes. It is parameterised in a discriminative framework and has polynomial time algorithms for learning and inference. Importantly, we consider partially-supervised learning and propose algorithms for generalised partially-supervised learning and constrained inference. We develop numerical scaling procedures that handle the overflow problem. We show that the HSCRF can be reduced to the semi-Markov conditional random fields. Finally, we demonstrate the HSCRF in two applications: (i) recognising human activities of daily living (ADLs) from indoor surveillance cameras, and (ii) noun-phrase chunking. The HSCRF is capable of learning rich hierarchical models with reasonable accuracy in both fully and partially observed data cases. },
        FILE = { :tran_etal_aij17hierarchical - Hierarchical Semi Markov Conditional Random Fields for Deep Recursive Sequential Data.pdf:PDF },
        KEYWORDS = { Deep nested sequential processes, Hierarchical semi-Markov conditional random field, Partial labelling, Constrained inference, Numerical scaling },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2017.02.21 },
        URL = { http://www.sciencedirect.com/science/article/pii/S0004370217300231 },
    }
J
  • See my thesis (chapter 5) for for an equivalent directed graphical model, which is the precusor of this work and where I had described the Assymetric Inside-Outside (AIO) algorithm in great detail. A brief version of this for directed case has also appeared in this AAAI'04's paper. The idea of semi-Markov duration modelling has also been addressed for directed case in these CVPR05 and AIJ09 papers.
  • Streaming Clustering with Bayesian Nonparametric Models
    Viet Huynh and Dinh Phung. Neurocomputing, 258:52-62, October 2017. [ | | pdf]
    Bayesian nonparametric (BNP) models are theoretically suitable for learning streaming data due to their complexity relaxation to growing data observed over time. There is a rich body of literature on developing efficient approximate methods for posterior inferences in BNP models, typically dominated by MCMC. However, very limited work has addressed posterior inference in a streaming fashion, which is important to fully realize the potential of BNP models applied to real-world tasks. The main challenge resides in developing one-pass posterior update which is consistent withthe data streamed over time (i.e., data is scanned only once), for which general MCMC methods will fail to address. On the other hand, Dirichlet process-based mixture models are the most fundamental building blocks in the field of BNP. To this end, we develop in this paper a class of variational methods suitable for posterior inference of the Dirichlet process mixture (DPM) models where both the posterior update and data are presented in a streaming setting. We first propose new methods to advance existing variational based inference approaches for BNP to allow the variational distributions growing over time, hence overcoming an important limitation of current methods in imposing parametric, truncated restrictions on the variational distributions. This results in two new methods namely truncation-free variational Bayes (TFVB) and truncation-free maximization expectation (TFME) respectively where the latter further supports hard clustering. These inference methods form the foundation for our streaming inference algorithm where we further adapt the recent Streaming Variational Bayes proposed in [1] to our task. To demonstrate our framework for realworld tasks whose datasets are often heterogeneous, we develop one more theoretical extension for our model to handle assorted data where each observation consists of different data types. Our experiments with automatically learning the number of clusters demonstrate the comparable inference capability of our framework in comparison with truncated version variational inference algorithms for both synthetic and real-world datasets. Moreover, an evaluation of streaming learning algorithms with text corpora reveals both quantitative and qualitative efficacy of the algorithms on clustering documents.
    @ARTICLE { huynh_phung_neuro17streaming,
        AUTHOR = { Viet Huynh and Dinh Phung },
        TITLE = { Streaming Clustering with Bayesian Nonparametric Models },
        JOURNAL = { Neurocomputing },
        YEAR = { 2017 },
        VOLUME = { 258 },
        PAGES = { 52--62 },
        MONTH = { October },
        ISSN = { 0925-2312 },
        ABSTRACT = { Bayesian nonparametric (BNP) models are theoretically suitable for learning streaming data due to their complexity relaxation to growing data observed over time. There is a rich body of literature on developing efficient approximate methods for posterior inferences in BNP models, typically dominated by MCMC. However, very limited work has addressed posterior inference in a streaming fashion, which is important to fully realize the potential of BNP models applied to real-world tasks. The main challenge resides in developing one-pass posterior update which is consistent withthe data streamed over time (i.e., data is scanned only once), for which general MCMC methods will fail to address. On the other hand, Dirichlet process-based mixture models are the most fundamental building blocks in the field of BNP. To this end, we develop in this paper a class of variational methods suitable for posterior inference of the Dirichlet process mixture (DPM) models where both the posterior update and data are presented in a streaming setting. We first propose new methods to advance existing variational based inference approaches for BNP to allow the variational distributions growing over time, hence overcoming an important limitation of current methods in imposing parametric, truncated restrictions on the variational distributions. This results in two new methods namely truncation-free variational Bayes (TFVB) and truncation-free maximization expectation (TFME) respectively where the latter further supports hard clustering. These inference methods form the foundation for our streaming inference algorithm where we further adapt the recent Streaming Variational Bayes proposed in [1] to our task. To demonstrate our framework for realworld tasks whose datasets are often heterogeneous, we develop one more theoretical extension for our model to handle assorted data where each observation consists of different data types. Our experiments with automatically learning the number of clusters demonstrate the comparable inference capability of our framework in comparison with truncated version variational inference algorithms for both synthetic and real-world datasets. Moreover, an evaluation of streaming learning algorithms with text corpora reveals both quantitative and qualitative efficacy of the algorithms on clustering documents. },
        FILE = { :huynh_phung_neuro17streaming - Streaming Clustering with Bayesian Nonparametric Models.pdf:PDF },
        KEYWORDS = { streaming learning, Bayesian nonparametric, variational Bayes inference, Dirichlet process, Dirichlet process mixtures, heterogeneous data sources },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2017.02.18 },
        URL = { http://www.sciencedirect.com/science/article/pii/S0925231217304253 },
    }
J
  • Effective Sparse Imputation of Patient Conditions in Electronic Medical Records for Emergency Risk Predictions
    Budhaditya Saha, Sunil Gupta, Dinh Phung and Svetha Venkatesh. Knowledge and Information Systems (KAIS), 2017. [ | | pdf]
    Electronic Medical Records (EMR) are being increasingly used for “risk” prediction.By “risks”, we denote outcomes such as emergency presentation, readmission, thelength of hospitalizations etc. However, EMR data analysis is complicated by missing entries.There are two reasons - the “primary reason for admission” is included in EMR, but thecomorbidities (other chronic diseases) are left uncoded, and, many zero values in the dataare accurate, reflecting that a patient has not accessed medical facilities. A key challenge isto deal with the peculiarities of this data - unlike many other datasets, EMR is sparse, reflectingthe fact that patients have some, but not all diseases. We propose a novel model to fill-inthese missing values and use the new representation for prediction of key hospital events. To“fill-in” missing values, we represent the feature-patient matrix as a product of two low-rankfactors, preserving the sparsity property in the product. Intuitively, the product regularizationallows sparse imputation of patient conditions reflecting common comorbidities acrosspatients. We develop a scalable optimization algorithm based on Block coordinate descentmethod to find an optimal solution. We evaluate the proposed framework on two real worldEMR cohorts: Cancer (7000 admissions) and Acute Myocardial Infarction (2652 admissions).Our result shows that the AUC for 3 months emergency presentation prediction isimproved significantly from (0.729 to 0.741) for Cancer data and (0.699 to 0.723) for AMIdata. Similarly, AUC for 3 months emergency admission prediction from (0.730 to 0.752)for Cancer data and (0.682 to 0.724) for AMI data. We also extend the proposed method toa supervised model for predicting multiple related risk outcomes (e.g. emergency presentationsand admissions in hospital over 3, 6 and 12 months period) in an integrated framework.The supervised model consistently outperforms state-of-the-art baseline methods.
    @ARTICLE { budhaditya_gupta_phung_venkatesh_kais17effective,
        AUTHOR = { Budhaditya Saha and Sunil Gupta and Dinh Phung and Svetha Venkatesh },
        TITLE = { Effective Sparse Imputation of Patient Conditions in Electronic Medical Records for Emergency Risk Predictions },
        JOURNAL = { Knowledge and Information Systems (KAIS) },
        YEAR = { 2017 },
        ABSTRACT = { Electronic Medical Records (EMR) are being increasingly used for “risk” prediction.By “risks”, we denote outcomes such as emergency presentation, readmission, thelength of hospitalizations etc. However, EMR data analysis is complicated by missing entries.There are two reasons - the “primary reason for admission” is included in EMR, but thecomorbidities (other chronic diseases) are left uncoded, and, many zero values in the dataare accurate, reflecting that a patient has not accessed medical facilities. A key challenge isto deal with the peculiarities of this data - unlike many other datasets, EMR is sparse, reflectingthe fact that patients have some, but not all diseases. We propose a novel model to fill-inthese missing values and use the new representation for prediction of key hospital events. To“fill-in” missing values, we represent the feature-patient matrix as a product of two low-rankfactors, preserving the sparsity property in the product. Intuitively, the product regularizationallows sparse imputation of patient conditions reflecting common comorbidities acrosspatients. We develop a scalable optimization algorithm based on Block coordinate descentmethod to find an optimal solution. We evaluate the proposed framework on two real worldEMR cohorts: Cancer (7000 admissions) and Acute Myocardial Infarction (2652 admissions).Our result shows that the AUC for 3 months emergency presentation prediction isimproved significantly from (0.729 to 0.741) for Cancer data and (0.699 to 0.723) for AMIdata. Similarly, AUC for 3 months emergency admission prediction from (0.730 to 0.752)for Cancer data and (0.682 to 0.724) for AMI data. We also extend the proposed method toa supervised model for predicting multiple related risk outcomes (e.g. emergency presentationsand admissions in hospital over 3, 6 and 12 months period) in an integrated framework.The supervised model consistently outperforms state-of-the-art baseline methods. },
        FILE = { :budhaditya_gupta_phung_venkatesh_kais17effective - Effective Sparse Imputation of Patient Conditions in Electronic Medical Records for Emergency Risk Predictions.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.05.17 },
        URL = { https://link.springer.com/article/10.1007/s10115-017-1038-0 },
    }
J
  • Energy-Based Localized Anomaly Detection in Video Surveillance
    Hung Vu, Tu Dinh Nguyen, Anthony Travers, Svetha Venkatesh and Dinh Phung. In The Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), Jeju, South Korea, May 23-26 2017. (Best Application Paper Award). [ | | pdf]
    Automated detection of abnormal events in video surveillance is an important task in research and practical applications. This is, however, a challenging problem due to the growing collection of data without the knowledge of what to be defined as “abnormal”, and the expensive feature engineering procedure. In this paper we introduce a unified framework for anomaly detection in video based on the restricted Boltzmann machine (RBM), a recent powerful method for unsupervised learning and representation learning. Our proposed system works directly on the image pixels rather than hand-crafted features, it learns new representations for data in a completely unsupervised manner without the need for labels, and then reconstructs the data to recognize the locations of abnormal events based on the reconstruction errors. More importantly, our approach can be deployed in both offline and streaming settings, in which trained parameters of the model are fixed in offline setting whilst are updated incrementally with video data arriving in a stream. Experiments on three publicly benchmark video datasets show that our proposed method can detect and localize the abnormalities at pixel level with better accuracy than those of baselines, and achieve competitive performance compared with state-of-the-art approaches. Moreover, as RBM belongs to a wider class of deep generative models, our framework lays the groundwork towards a more powerful deep unsupervised abnormality detection framework.
    @INPROCEEDINGS { vu_etal_pakdd17energy,
        AUTHOR = { Hung Vu and Tu Dinh Nguyen and Anthony Travers and Svetha Venkatesh and Dinh Phung },
        TITLE = { Energy-Based Localized Anomaly Detection in Video Surveillance },
        BOOKTITLE = { The Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) },
        YEAR = { 2017 },
        EDITOR = { Jinho Kim, Kyuseok Shim, Longbing Cao, Jae-Gil Lee, Xuemin Lin, Yang-Sae Moon },
        ADDRESS = { Jeju, South Korea },
        MONTH = { May 23-26 },
        NOTE = { Best Application Paper Award },
        ABSTRACT = { Automated detection of abnormal events in video surveillance is an important task in research and practical applications. This is, however, a challenging problem due to the growing collection of data without the knowledge of what to be defined as “abnormal”, and the expensive feature engineering procedure. In this paper we introduce a unified framework for anomaly detection in video based on the restricted Boltzmann machine (RBM), a recent powerful method for unsupervised learning and representation learning. Our proposed system works directly on the image pixels rather than hand-crafted features, it learns new representations for data in a completely unsupervised manner without the need for labels, and then reconstructs the data to recognize the locations of abnormal events based on the reconstruction errors. More importantly, our approach can be deployed in both offline and streaming settings, in which trained parameters of the model are fixed in offline setting whilst are updated incrementally with video data arriving in a stream. Experiments on three publicly benchmark video datasets show that our proposed method can detect and localize the abnormalities at pixel level with better accuracy than those of baselines, and achieve competitive performance compared with state-of-the-art approaches. Moreover, as RBM belongs to a wider class of deep generative models, our framework lays the groundwork towards a more powerful deep unsupervised abnormality detection framework. },
        FILE = { :vu_etal_pakdd17energy - Energy Based Localized Anomaly Detection in Video Surveillance.pdf:PDF },
        OWNER = { hungv },
        TIMESTAMP = { 2017.01.31 },
        URL = { https://link.springer.com/chapter/10.1007/978-3-319-57454-7_50 },
    }
C
2016
  • One-Pass Logistic Regression for Label-Drift and Large-Scale Classification on Distributed Systems
    Nguyen, Vu, Nguyen, Tu Dinh, Le, Trung, Phung, Dinh and Venkatesh, Svetha. In 2016 IEEE 16th International Conference on Data Mining (ICDM), pages 1113-1118, Dec 2016. [ | | pdf | code]
    Logistic regression (LR) for classification is the workhorse in industry, where a set of predefined classes is required. The model, however, fails to work in the case where the class labels are not known in advance, a problem we term label-drift classification. Label-drift classification problem naturally occurs in many applications, especially in the context of streaming settings where the incoming data may contain samples categorized with new classes that have not been previously seen. Additionally, in the wave of big data, traditional LR methods may fail due to their expense of running time. In this paper, we introduce a novel variant of LR, namely one-pass logistic regression (OLR) to offer a principled treatment for label-drift and large-scale classifications. To handle largescale classification for big data, we further extend our OLR to a distributed setting for parallelization, termed sparkling OLR (Spark-OLR). We demonstrate the scalability of our proposed methods on large-scale datasets with more than one hundred million data points. The experimental results show that the predictive performances of our methods are comparable orbetter than those of state-of-the-art baselines whilst the executiontime is much faster at an order of magnitude. In addition, the OLR and Spark-OLR are invariant to data shuffling and have no hyperparameter to tune that significantly benefits data practitioners and overcomes the curse of big data cross-validationto select optimal hyperparameters.
    @CONFERENCE { nguyen_etal_icdm16onepass,
        AUTHOR = { Nguyen, Vu and Nguyen, Tu Dinh and Le, Trung and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { One-Pass Logistic Regression for Label-Drift and Large-Scale Classification on Distributed Systems },
        BOOKTITLE = { 2016 IEEE 16th International Conference on Data Mining (ICDM) },
        YEAR = { 2016 },
        PAGES = { 1113-1118 },
        MONTH = { Dec },
        ABSTRACT = { Logistic regression (LR) for classification is the workhorse in industry, where a set of predefined classes is required. The model, however, fails to work in the case where the class labels are not known in advance, a problem we term label-drift classification. Label-drift classification problem naturally occurs in many applications, especially in the context of streaming settings where the incoming data may contain samples categorized with new classes that have not been previously seen. Additionally, in the wave of big data, traditional LR methods may fail due to their expense of running time. In this paper, we introduce a novel variant of LR, namely one-pass logistic regression (OLR) to offer a principled treatment for label-drift and large-scale classifications. To handle largescale classification for big data, we further extend our OLR to a distributed setting for parallelization, termed sparkling OLR (Spark-OLR). We demonstrate the scalability of our proposed methods on large-scale datasets with more than one hundred million data points. The experimental results show that the predictive performances of our methods are comparable orbetter than those of state-of-the-art baselines whilst the executiontime is much faster at an order of magnitude. In addition, the OLR and Spark-OLR are invariant to data shuffling and have no hyperparameter to tune that significantly benefits data practitioners and overcomes the curse of big data cross-validationto select optimal hyperparameters. },
        CODE = { https://github.com/ntienvu/ICDM2016_OLR },
        DOI = { 10.1109/ICDM.2016.0145 },
        FILE = { :nguyen_etal_icdm16onepass - One Pass Logistic Regression for Label Drift and Large Scale Classification on Distributed Systems.pdf:PDF },
        KEYWORDS = { Big Data;distributed processing;pattern classification;regression analysis;Big Data cross-validation;Spark-OLR;class labels;data shuffling;distributed systems;execution time;label-drift classification problem;large-scale classification;large-scale datasets;one-pass logistic regression;optimal hyperparameter selection;sparkling OLR;Bayes methods;Big data;Context;Data models;Estimation;Industries;Logistics;Apache Spark;Logistic regression;distributed system;label-drift;large-scale classification },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.09.10 },
        URL = { http://ieeexplore.ieee.org/document/7837958/ },
    }
C
  • Dual Space Gradient Descent for Online Learning
    Le, Trung, Nguyen, Tu Dinh, Nguyen, Vu and Phung, Dinh. In Advances in Neural Information Processing (NIPS), December 2016. [ | | pdf]
    One crucial goal in kernel online learning is to bound the model size. Common approaches employ budget maintenance procedures to restrict the model sizes using removal, projection, or merging strategies. Although projection and merging, in the literature, are known to be the most effective strategies, they demand extensive computation whilst removal strategy fails to retain information of the removed vectors. An alternative way to address the model size problem is to apply random features to approximate the kernel function. This allows the model to be maintained directly in the random feature space, hence effectively resolve the curse of kernelization. However, this approach still suffers from a serious shortcoming as it needs to use a high dimensional random feature space to achieve a sufficiently accurate kernel approximation. Consequently, it leads to a significant increase in the computational cost. To address all of these aforementioned challenges, we present in this paper the Dual Space Gradient Descent (DualSGD), a novel framework that utilizes random features as an auxiliary space to maintain information from data points removed during budget maintenance. Consequently, our approach permits the budget to be maintained in a simple, direct and elegant way while simultaneously mitigating the impact of the dimensionality issue on learning performance. We further provide convergence analysis and extensively conduct experiments on five real-world datasets to demonstrate the predictive performance and scalability of our proposed method in comparison with the state-of-the-art baselines.
    @CONFERENCE { le_etal_nips16dual,
        AUTHOR = { Le, Trung and Nguyen, Tu Dinh and Nguyen, Vu and Phung, Dinh },
        TITLE = { Dual Space Gradient Descent for Online Learning },
        BOOKTITLE = { Advances in Neural Information Processing (NIPS) },
        YEAR = { 2016 },
        MONTH = { December },
        ABSTRACT = { One crucial goal in kernel online learning is to bound the model size. Common approaches employ budget maintenance procedures to restrict the model sizes using removal, projection, or merging strategies. Although projection and merging, in the literature, are known to be the most effective strategies, they demand extensive computation whilst removal strategy fails to retain information of the removed vectors. An alternative way to address the model size problem is to apply random features to approximate the kernel function. This allows the model to be maintained directly in the random feature space, hence effectively resolve the curse of kernelization. However, this approach still suffers from a serious shortcoming as it needs to use a high dimensional random feature space to achieve a sufficiently accurate kernel approximation. Consequently, it leads to a significant increase in the computational cost. To address all of these aforementioned challenges, we present in this paper the Dual Space Gradient Descent (DualSGD), a novel framework that utilizes random features as an auxiliary space to maintain information from data points removed during budget maintenance. Consequently, our approach permits the budget to be maintained in a simple, direct and elegant way while simultaneously mitigating the impact of the dimensionality issue on learning performance. We further provide convergence analysis and extensively conduct experiments on five real-world datasets to demonstrate the predictive performance and scalability of our proposed method in comparison with the state-of-the-art baselines. },
        FILE = { :le_etal_nips16dual - Dual Space Gradient Descent for Online Learning.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.08.16 },
        URL = { https://papers.nips.cc/paper/6560-dual-space-gradient-descent-for-online-learning.pdf },
    }
C
  • Scalable Nonparametric Bayesian Multilevel Clustering
    Viet Huynh, Dinh Phung, Svetha Venkatesh, Xuan-Long Nguyen, Matt Hoffman and Hung Bui. In Proceedings of the 32nd Conference on Uncertainty in Artificial Intelligence (UAI), pages 289-298, June 2016. [ | | pdf]
    @CONFERENCE { huynh_phung_venkatesh_nguyen_hoffman_bui_uai16scalable,
        AUTHOR = { Viet Huynh and Dinh Phung and Svetha Venkatesh and Xuan-Long Nguyen and Matt Hoffman and Hung Bui },
        TITLE = { Scalable Nonparametric {B}ayesian Multilevel Clustering },
        BOOKTITLE = { Proceedings of the 32nd Conference on Uncertainty in Artificial Intelligence (UAI) },
        YEAR = { 2016 },
        MONTH = { June },
        PUBLISHER = { AUAI Pres },
        PAGES = { 289--298 },
        FILE = { :huynh_phung_venkatesh_nguyen_hoffman_bui_uai16scalable - Scalable Nonparametric Bayesian Multilevel Clustering.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.05.09 },
        URL = { http://auai.org/uai2016/proceedings/papers/262.pdf },
    }
C
  • Budgeted Semi-supervised Support Vector Machine
    Le, Trung, Duong, Phuong, Dinh, Mi, Nguyen, Tu, Nguyen, Vu and Phung, Dinh. In 32nd Conference on Uncertainty in Artificial Intelligence (UAI), June 2016. [ | | pdf]
    @CONFERENCE { le_duong_dinh_nguyen_nguyen_phung_uai16budgeted,
        AUTHOR = { Le, Trung and Duong, Phuong and Dinh, Mi and Nguyen, Tu and Nguyen, Vu and Phung, Dinh },
        TITLE = { Budgeted Semi-supervised {S}upport {V}ector {M}achine },
        BOOKTITLE = { 32nd Conference on Uncertainty in Artificial Intelligence (UAI) },
        YEAR = { 2016 },
        MONTH = { June },
        FILE = { :le_duong_dinh_nguyen_nguyen_phung_uai16budgeted - Budgeted Semi Supervised Support Vector Machine.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.05.09 },
        URL = { http://auai.org/uai2016/proceedings/papers/110.pdf },
    }
C
  • Nonparametric Budgeted Stochastic Gradient Descent
    Le, Trung, Nguyen, Vu, Nguyen, Tu Dinh and Phung, Dinh. In 19th Intl. Conf. on Artificial Intelligence and Statistics (AISTATS), May 2016. [ | | pdf]
    @CONFERENCE { le_nguyen_phung_aistats16nonparametric,
        AUTHOR = { Le, Trung and Nguyen, Vu and Nguyen, Tu Dinh and Phung, Dinh },
        TITLE = { Nonparametric Budgeted Stochastic Gradient Descent },
        BOOKTITLE = { 19th Intl. Conf. on Artificial Intelligence and Statistics (AISTATS) },
        YEAR = { 2016 },
        MONTH = { May },
        FILE = { :le_nguyen_phung_aistats16nonparametric - Nonparametric Budgeted Stochastic Gradient Descent.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.04.06 },
        URL = { http://www.jmlr.org/proceedings/papers/v51/le16.pdf },
    }
C
  • Introduction: special issue of selected papers from ACML 2014
    Dinh Phung and Hang Li and Tru Cao and Tu-Bao Ho and Zhi-Hua Zhou, editor. volume 103, Springer, May 2016. [ | | pdf]
    @PROCEEDINGS { li_phung_cao_ho_zhou_acml14_selectedpapers,
        TITLE = { Introduction: special issue of selected papers from {ACML} 2014 },
        YEAR = { 2016 },
        EDITOR = { Dinh Phung and Hang Li and Tru Cao and Tu-Bao Ho and Zhi-Hua Zhou },
        VOLUME = { 103 },
        NUMBER = { 2 },
        PUBLISHER = { Springer },
        MONTH = { May },
        FILE = { :li_phung_cao_ho_zhou_acml14_selectedpapers - Introduction_ Special Issue of Selected Papers from ACML 2014.pdf:PDF },
        ISSN = { 1573-0565 },
        JOURNAL = { Machine Learning },
        OWNER = { Thanh-Binh Nguyen },
        PAGES = { 137--139 },
        TIMESTAMP = { 2016.04.11 },
        URL = { http://dx.doi.org/10.1007/s10994-016-5549-9 },
    }
P
  • Guidelines for Developing and Reporting Machine Learning Predictive Models in Biomedical Research: A Multidisciplinary View
    Luo, Wei, Phung, Dinh, Tran, Truyen, Gupta, Sunil, Rana, Santu, Karmakar, Chandan, Shilton, Alistair, Yearwood, John, Dimitrova, Nevenka, Ho, Bao Tu, Venkatesh, Svetha and Berk, Michael. J Med Internet Res, 18(12):e323, Dec 2016. [ | | pdf]
    Background: As more and more researchers are turning to big data for new opportunities of biomedical discoveries, machine learning models, as the backbone of big data analysis, are mentioned more often in biomedical journals. However, owing to the inherent complexity of machine learning methods, they are prone to misuse. Because of the flexibility in specifying machine learning models, the results are often insufficiently reported in research articles, hindering reliable assessment of model validity and consistent interpretation of model outputs. Objective: To attain a set of guidelines on the use of machine learning predictive models within clinical settings to make sure the models are correctly applied and sufficiently reported so that true discoveries can be distinguished from random coincidence. Methods: A multidisciplinary panel of machine learning experts, clinicians, and traditional statisticians were interviewed, using an iterative process in accordance with the Delphi method. Results: The process produced a set of guidelines that consists of (1) a list of reporting items to be included in a research article and (2) a set of practical sequential steps for developing predictive models. Conclusions: A set of guidelines was generated to enable correct application of machine learning models and consistent reporting of model specifications and results in biomedical research. We believe that such guidelines will accelerate the adoption of big data analysis, particularly with machine learning methods, in the biomedical research community.
    @ARTICLE { Luo_etal_jmir16guidelines,
        AUTHOR = { Luo, Wei and Phung, Dinh and Tran, Truyen and Gupta, Sunil and Rana, Santu and Karmakar, Chandan and Shilton, Alistair and Yearwood, John and Dimitrova, Nevenka and Ho, Bao Tu and Venkatesh, Svetha and Berk, Michael },
        TITLE = { Guidelines for Developing and Reporting Machine Learning Predictive Models in Biomedical Research: A Multidisciplinary View },
        JOURNAL = { J Med Internet Res },
        YEAR = { 2016 },
        VOLUME = { 18 },
        NUMBER = { 12 },
        PAGES = { e323 },
        MONTH = { Dec },
        ABSTRACT = { Background: As more and more researchers are turning to big data for new opportunities of biomedical discoveries, machine learning models, as the backbone of big data analysis, are mentioned more often in biomedical journals. However, owing to the inherent complexity of machine learning methods, they are prone to misuse. Because of the flexibility in specifying machine learning models, the results are often insufficiently reported in research articles, hindering reliable assessment of model validity and consistent interpretation of model outputs. Objective: To attain a set of guidelines on the use of machine learning predictive models within clinical settings to make sure the models are correctly applied and sufficiently reported so that true discoveries can be distinguished from random coincidence. Methods: A multidisciplinary panel of machine learning experts, clinicians, and traditional statisticians were interviewed, using an iterative process in accordance with the Delphi method. Results: The process produced a set of guidelines that consists of (1) a list of reporting items to be included in a research article and (2) a set of practical sequential steps for developing predictive models. Conclusions: A set of guidelines was generated to enable correct application of machine learning models and consistent reporting of model specifications and results in biomedical research. We believe that such guidelines will accelerate the adoption of big data analysis, particularly with machine learning methods, in the biomedical research community. },
        DAY = { 16 },
        DOI = { 10.2196/jmir.5870 },
        FILE = { :Luo_etal_jmir16guidelines - Guidelines for Developing and Reporting Machine Learning Predictive Models in Biomedical Research_ a Multidisciplinary View.pdf:PDF },
        KEYWORDS = { machine learning, clinical prediction rule, guideline },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.12.21 },
        URL = { http://www.jmir.org/2016/12/e323/ },
    }
J
  • Data Clustering Using Side Information Dependent Chinese Restaurant Processes
    Li, Cheng, Rana, Santu, Phung, Dinh and Venkatesh, Svetha. Knowledge and Information Systems (KAIS), 47(2):463-488, May 2016. [ | | pdf]
    Side information, or auxiliary information associated with documents or image content, provides hints for clustering. We propose a new model, side information dependent Chinese restaurant process, which exploits side information in a Bayesian nonparametric model to improve data clustering. We introduce side information into the framework of distance dependent Chinese restaurant process using a robust decay function to handle noisy side information. The threshold parameter of the decay function is updated automatically in the Gibbs sampling process. A fast inference algorithm is proposed. We evaluate our approach on four datasets: Cora, 20 Newsgroups, NUS-WIDE and one medical dataset. Types of side information explored in this paper include citations, authors, tags, keywords and auxiliary clinical information. The comparison with the state-of-the-art approaches based on standard performance measures (NMI, F1) clearly shows the superiority of our approach.
    @ARTICLE { li_rana_phung_venkatesh_kais16,
        AUTHOR = { Li, Cheng and Rana, Santu and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { Data Clustering Using Side Information Dependent {C}hinese Restaurant Processes },
        JOURNAL = { Knowledge and Information Systems (KAIS) },
        YEAR = { 2016 },
        VOLUME = { 47 },
        NUMBER = { 2 },
        PAGES = { 463--488 },
        MONTH = { May },
        ABSTRACT = { Side information, or auxiliary information associated with documents or image content, provides hints for clustering. We propose a new model, side information dependent Chinese restaurant process, which exploits side information in a Bayesian nonparametric model to improve data clustering. We introduce side information into the framework of distance dependent Chinese restaurant process using a robust decay function to handle noisy side information. The threshold parameter of the decay function is updated automatically in the Gibbs sampling process. A fast inference algorithm is proposed. We evaluate our approach on four datasets: Cora, 20 Newsgroups, NUS-WIDE and one medical dataset. Types of side information explored in this paper include citations, authors, tags, keywords and auxiliary clinical information. The comparison with the state-of-the-art approaches based on standard performance measures (NMI, F1) clearly shows the superiority of our approach. },
        DOI = { 10.1007/s10115-015-0834-7 },
        FILE = { :li_rana_phung_venkatesh_kais16 - Data Clustering Using Side Information Dependent Chinese Restaurant Processes.pdf:PDF },
        KEYWORDS = { Side information Similarity Data clustering Bayesian nonparametric models },
        OWNER = { Dinh },
        TIMESTAMP = { 2015.03.02 },
        URL = { http://link.springer.com/article/10.1007/s10115-015-0834-7 },
    }
J
  • Multiple Kernel Learning with Data Augmentation
    Nguyen, Khanh, Le, Trung, Nguyen, Vu, Nguyen, Tu Dinh and Phung, Dinh. In 8th Asian Conference on Machine Learning (ACML), Nov. 2016. [ | ]
    @CONFERENCE { nguyen_etal_acml16multiple,
        AUTHOR = { Nguyen, Khanh and Le, Trung and Nguyen, Vu and Nguyen, Tu Dinh and Phung, Dinh },
        TITLE = { Multiple Kernel Learning with Data Augmentation },
        BOOKTITLE = { 8th Asian Conference on Machine Learning (ACML) },
        YEAR = { 2016 },
        MONTH = { Nov. },
        FILE = { :nguyen_etal_acml16multiple - Multiple Kernel Learning with Data Augmentation.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.07.13 },
    }
C
  • Exceptional Contrast Set Mining: Moving Beyond the Deluge of the Obvious
    Nguyen, Dang, Luo, Wei, Phung, Dinh and Venkatesh, Svetha. In Advances in Artificial Intelligence, pages 455-468.Springer, , 2016. (Student travel award). [ | | pdf]
    Data scientists, with access to fast growing data and computing power, constantly look for algorithms with greater detection power to discover “novel” knowledge. But more often than not, their algorithms give them too many outputs that are either highly speculative or simply confirming what the domain experts already know. To escape this dilemma, we need algorithms that move beyond the obvious association analyses and leverage domain analytic objectives (aka. KPIs) to look for higher order connections. We propose a new technique Exceptional Contrast Set Mining that first gathers a succinct collection of affirmative contrast sets based on the principle of redundant information elimination. Then it discovers exceptional contrast sets that contradict the affirmative contrast sets. The algorithm has been successfully applied to several analytic consulting projects. In particular, during an analysis of a state-wide cancer registry, it discovered a surprising regional difference in breast cancer screening.
    @INCOLLECTION { nguyen_etal_ai16exceptional,
        AUTHOR = { Nguyen, Dang and Luo, Wei and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { Exceptional Contrast Set Mining: Moving Beyond the Deluge of the Obvious },
        BOOKTITLE = { Advances in Artificial Intelligence },
        PUBLISHER = { Springer },
        YEAR = { 2016 },
        VOLUME = { 9992 },
        PAGES = { 455--468 },
        NOTE = { Student travel award },
        ABSTRACT = { Data scientists, with access to fast growing data and computing power, constantly look for algorithms with greater detection power to discover “novel” knowledge. But more often than not, their algorithms give them too many outputs that are either highly speculative or simply confirming what the domain experts already know. To escape this dilemma, we need algorithms that move beyond the obvious association analyses and leverage domain analytic objectives (aka. KPIs) to look for higher order connections. We propose a new technique Exceptional Contrast Set Mining that first gathers a succinct collection of affirmative contrast sets based on the principle of redundant information elimination. Then it discovers exceptional contrast sets that contradict the affirmative contrast sets. The algorithm has been successfully applied to several analytic consulting projects. In particular, during an analysis of a state-wide cancer registry, it discovered a surprising regional difference in breast cancer screening. },
        FILE = { :nguyen_etal_ai16exceptional - Exceptional Contrast Set Mining_ Moving beyond the Deluge of the Obvious.pdf:PDF },
        GROUPS = { Contrast Set Mining },
        ORGANIZATION = { Springer },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2017.01.05 },
        URL = { http://link.springer.com/chapter/10.1007/978-3-319-50127-7_39 },
    }
BC
  • SECC: Simultaneous extraction of context and community from pervasive signals
    Nguyen, T., Nguyen, V., Salim, F.D. and Phung, D.. In IEEE Intl. Conf. on Pervasive Computing and Communications (PERCOM), pages 1-9, March 2016. [ | | pdf]
    Understanding user contexts and group structures plays a central role in pervasive computing. These contexts and community structures are complex to mine from data collected in the wild due to the unprecedented growth of data, noise, uncertainties and complexities. Typical existing approaches would first extract the latent patterns to explain the human dynamics or behaviors and then use them as the way to consistently formulate numerical representations for community detection, often via a clustering method. While being able to capture highorder and complex representations, these two steps are performed separately. More importantly, they face a fundamental difficulty in determining the correct number of latent patterns and communities. This paper presents an approach that seamlessly addresses these challenges to simultaneously discover latent patterns and communities in a unified Bayesian nonparametric framework. Our Simultaneous Extraction of Context and Community (SECC) model roots in the nested Dirichlet process theory which allows nested structure to be built to explain data at multiple levels. We demonstrate our framework on three public datasets where the advantages of the proposed approach are validated.
    @INPROCEEDINGS { nguyen_nguyen_salim_phung_percom16secc,
        AUTHOR = { Nguyen, T. and Nguyen, V. and Salim, F.D. and Phung, D. },
        TITLE = { {SECC}: Simultaneous extraction of context and community from pervasive signals },
        BOOKTITLE = { IEEE Intl. Conf. on Pervasive Computing and Communications (PERCOM) },
        YEAR = { 2016 },
        PAGES = { 1-9 },
        MONTH = { March },
        ABSTRACT = { Understanding user contexts and group structures plays a central role in pervasive computing. These contexts and community structures are complex to mine from data collected in the wild due to the unprecedented growth of data, noise, uncertainties and complexities. Typical existing approaches would first extract the latent patterns to explain the human dynamics or behaviors and then use them as the way to consistently formulate numerical representations for community detection, often via a clustering method. While being able to capture highorder and complex representations, these two steps are performed separately. More importantly, they face a fundamental difficulty in determining the correct number of latent patterns and communities. This paper presents an approach that seamlessly addresses these challenges to simultaneously discover latent patterns and communities in a unified Bayesian nonparametric framework. Our Simultaneous Extraction of Context and Community (SECC) model roots in the nested Dirichlet process theory which allows nested structure to be built to explain data at multiple levels. We demonstrate our framework on three public datasets where the advantages of the proposed approach are validated. },
        DOI = { 10.1109/PERCOM.2016.7456501 },
        FILE = { :nguyen_nguyen_salim_phung_percom16secc - SECC_ Simultaneous Extraction of Context and Community from Pervasive Signals.pdf:PDF },
        KEYWORDS = { Bluetooth;Context;Context modeling;Data mining;Data models;Feature extraction;Mixture models },
        URL = { http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7456501 },
    }
C
  • Nonparametric discovery of movement patterns from accelerometer signals
    Nguyen, T., Gupta, S., Venkatesh, S. and Phung, D.. Pattern Recognition Letters, 70(C):52-58, Jan. 2016. [ | | pdf]
    Monitoring daily physical activity plays an important role in disease prevention and intervention. This paper proposes an approach to monitor the body movement intensity levels from accelerometer data. We collect the data using the accelerometer in a realistic setting without any supervision. The ground-truth of activities is provided by the participants themselves using an experience sampling application running on their mobile phones. We compute a novel feature that has a strong correlation with the movement intensity. We use the hierarchical Dirichlet process (HDP) model to detect the activity levels from this feature. Consisting of Bayesian nonparametric priors over the parameters the model can infer the number of levels automatically. By demonstrating the approach on the publicly available USC-HAD dataset that includes ground-truth activity labels, we show a strong correlation between the discovered activity levels and the movement intensity of the activities. This correlation is further confirmed using our newly collected dataset. We further use the extracted patterns as features for clustering and classifying the activity sequences to improve performance.
    @ARTICLE { nguyen_gupta_venkatesh_phung_pr16nonparametric,
        AUTHOR = { Nguyen, T. and Gupta, S. and Venkatesh, S. and Phung, D. },
        TITLE = { Nonparametric discovery of movement patterns from accelerometer signals },
        JOURNAL = { Pattern Recognition Letters },
        YEAR = { 2016 },
        VOLUME = { 70 },
        NUMBER = { C },
        PAGES = { 52--58 },
        MONTH = { Jan. },
        ISSN = { 0167-8655 },
        ABSTRACT = { Monitoring daily physical activity plays an important role in disease prevention and intervention. This paper proposes an approach to monitor the body movement intensity levels from accelerometer data. We collect the data using the accelerometer in a realistic setting without any supervision. The ground-truth of activities is provided by the participants themselves using an experience sampling application running on their mobile phones. We compute a novel feature that has a strong correlation with the movement intensity. We use the hierarchical Dirichlet process (HDP) model to detect the activity levels from this feature. Consisting of Bayesian nonparametric priors over the parameters the model can infer the number of levels automatically. By demonstrating the approach on the publicly available USC-HAD dataset that includes ground-truth activity labels, we show a strong correlation between the discovered activity levels and the movement intensity of the activities. This correlation is further confirmed using our newly collected dataset. We further use the extracted patterns as features for clustering and classifying the activity sequences to improve performance. },
        DOI = { http://dx.doi.org/10.1016/j.patrec.2015.11.003 },
        FILE = { :nguyen_gupta_venkatesh_phung_pr16nonparametric - Nonparametric Discovery of Movement Patterns from Accelerometer Signals.pdf:PDF },
        KEYWORDS = { Accelerometer, Activity recognition, Bayesian nonparametric, Dirichlet process, Movement intensity },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.05.10 },
        URL = { http://www.sciencedirect.com/science/article/pii/S016786551500389X },
    }
J
  • Preterm Birth Prediction: Stable Selection of Interpretable Rules from High Dimensional Data
    Tran, Truyen, Luo, Wei, Phung, Dinh, Morris, Jonathan, Rickard, Kristen and Venkatesh, Svetha. In Proceedings of the 1st Machine Learning for Healthcare Conference, pages 164-177, 2016. [ | | pdf]
    Preterm births occur at an alarming rate of 10-15%. Preemies have a higher risk of infant mortality, developmental retardation and long-term disabilities. Predicting preterm birth is difficult, even for the most experienced clinicians. The most well-designed clinical study thus far reaches a modest sensitivity of 18.2–24.2% at specificity of 28.6–33.3%. We take a different approach by exploiting databases of normal hospital operations. We aims are twofold: (i) to derive an easy-to-use, interpretable prediction rule with quantified uncertainties, and (ii) to construct accurate classifiers for preterm birth prediction. Our approach is to automatically generate and select from hundreds (if not thousands) of possible predictors using stability-aware techniques. Derived from a large database of 15,814 women, our simplified prediction rule with only 10 items has sensitivity of 62.3% at specificity of 81.5%.
    @INPROCEEDINGS { tran_etal_mlhc16pretern,
        AUTHOR = { Tran, Truyen and Luo, Wei and Phung, Dinh and Morris, Jonathan and Rickard, Kristen and Venkatesh, Svetha },
        TITLE = { Preterm Birth Prediction: Stable Selection of Interpretable Rules from High Dimensional Data },
        BOOKTITLE = { Proceedings of the 1st Machine Learning for Healthcare Conference },
        YEAR = { 2016 },
        EDITOR = { Finale Doshi-Velez, Jim Fackler, David Kale, Byron Wallace, Jenna Weins },
        VOLUME = { 56 },
        SERIES = { JMLR Workshop and Conference Proceedings },
        PAGES = { 164--177 },
        PUBLISHER = { JMLR },
        ABSTRACT = { Preterm births occur at an alarming rate of 10-15%. Preemies have a higher risk of infant mortality, developmental retardation and long-term disabilities. Predicting preterm birth is difficult, even for the most experienced clinicians. The most well-designed clinical study thus far reaches a modest sensitivity of 18.2–24.2% at specificity of 28.6–33.3%. We take a different approach by exploiting databases of normal hospital operations. We aims are twofold: (i) to derive an easy-to-use, interpretable prediction rule with quantified uncertainties, and (ii) to construct accurate classifiers for preterm birth prediction. Our approach is to automatically generate and select from hundreds (if not thousands) of possible predictors using stability-aware techniques. Derived from a large database of 15,814 women, our simplified prediction rule with only 10 items has sensitivity of 62.3% at specificity of 81.5%. },
        FILE = { :tran_etal_mlhc16pretern - Preterm Birth Prediction_ Stable Selection of Interpretable Rules from High Dimensional Data.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.11.02 },
        URL = { http://jmlr.org/proceedings/papers/v56/Tran16.html },
    }
C
  • Computer Assisted Autism Interventions for India
    Vellanki, Pratibha, Greenhill, Stewart, Duong, Thi, Phung, Dinh, Venkatesh, Svetha, Godwin, Jayashree, Achary, Kishna V. and Varkey, Blessin. In Proceedings of the 28th Australian Conference on Computer-Human Interaction, pages 618-622, New York, NY, USA, 2016. [ | | pdf]
    @INPROCEEDINGS { vellanki_etal_ozchi16computer,
        AUTHOR = { Vellanki, Pratibha and Greenhill, Stewart and Duong, Thi and Phung, Dinh and Venkatesh, Svetha and Godwin, Jayashree and Achary, Kishna V. and Varkey, Blessin },
        TITLE = { Computer Assisted Autism Interventions for {I}ndia },
        BOOKTITLE = { Proceedings of the 28th Australian Conference on Computer-Human Interaction },
        YEAR = { 2016 },
        SERIES = { OzCHI '16 },
        PAGES = { 618--622 },
        ADDRESS = { New York, NY, USA },
        PUBLISHER = { ACM },
        ACMID = { 3011007 },
        DOI = { 10.1145/3010915.3011007 },
        FILE = { :vellanki_etal_ozchi16computer - Computer Assisted Autism Interventions for India.pdf:PDF },
        ISBN = { 978-1-4503-4618-4 },
        KEYWORDS = { Hindi, India, assistive technology, autism, early intervention, translation },
        LOCATION = { Launceston, Tasmania, Australia },
        NUMPAGES = { 5 },
        URL = { http://doi.acm.org/10.1145/3010915.3011007 },
    }
C
  • A Simultaneous Extraction of Context and Community from Pervasive Signals Using Nested Dirichlet Process
    Nguyen, T., Nguyen, V., Salim, F.D., Le, D.V. and Phung, D.. Pervasive and Mobile Computing (PMC), 2016. [ | | pdf]
    Understanding user contexts and group structures plays a central role in pervasive computing. These contexts and community structures are complex to mine from data collected in the wild due to the unprecedented growth of data, noise, uncertainties and complexities. Typical existing approaches would first extract the latent patterns to explain the human dynamics or behaviors and then use them as a way to consistently formulate numerical representations for community detection, often via a clustering method. While being able to capture high-order and complex representations, these two steps are performed separately. More importantly, they face a fundamental difficulty in determining the correct number of latent patterns and communities. This paper presents an approach that seamlessly addresses these challenges to simultaneously discover latent patterns and communities in a unified Bayesian nonparametric framework. Our Simultaneous Extraction of Context and Community (SECC) model roots in the nested Dirichlet process theory which allows nested structure to be built to summarize data at multiple levels. We demonstrate our framework on five datasets where the advantages of the proposed approach are validated.
    @ARTICLE { nguyen_nguyen_flora_le_phung_pmc16simultaneous,
        AUTHOR = { Nguyen, T. and Nguyen, V. and Salim, F.D. and Le, D.V. and Phung, D. },
        TITLE = { A Simultaneous Extraction of Context and Community from Pervasive Signals Using Nested {D}irichlet Process },
        JOURNAL = { Pervasive and Mobile Computing (PMC) },
        YEAR = { 2016 },
        ABSTRACT = { Understanding user contexts and group structures plays a central role in pervasive computing. These contexts and community structures are complex to mine from data collected in the wild due to the unprecedented growth of data, noise, uncertainties and complexities. Typical existing approaches would first extract the latent patterns to explain the human dynamics or behaviors and then use them as a way to consistently formulate numerical representations for community detection, often via a clustering method. While being able to capture high-order and complex representations, these two steps are performed separately. More importantly, they face a fundamental difficulty in determining the correct number of latent patterns and communities. This paper presents an approach that seamlessly addresses these challenges to simultaneously discover latent patterns and communities in a unified Bayesian nonparametric framework. Our Simultaneous Extraction of Context and Community (SECC) model roots in the nested Dirichlet process theory which allows nested structure to be built to summarize data at multiple levels. We demonstrate our framework on five datasets where the advantages of the proposed approach are validated. },
        DOI = { http://dx.doi.org/10.1016/j.pmcj.2016.08.019 },
        FILE = { :nguyen_nguyen_flora_le_phung_pmc16simultaneous - A Simultaneous Extraction of Context and Community from Pervasive Signals Using Nested Dirichlet Process.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.08.17 },
        URL = { http://www.sciencedirect.com/science/article/pii/S1574119216302097 },
    }
J
  • Nonparametric discovery and analysis of learning patterns and autism subgroups from therapeutic data
    Vellanki, Pratibha, Duong, Thi, Gupta, Sunil, Venkatesh, Svetha and Phung, Dinh. Knowledge and Information Systems (KAIS), 2016. [ | | pdf]
    The spectrum nature and heterogeneity within autism spectrum disorders (ASD) pose as a challenge for treatment. Personalisation of syllabus for children with ASD can improve the efficacy of learning by adjusting the number of opportunities and deciding the course of syllabus. We research the data-motivated approach in an attempt to disentangle this heterogeneity for personalisation of syllabus. With the help of technology and a structured syllabus, collecting data while a child with ASD masters the skills is made possible. The performance data collected are, however, growing and contain missing elements based on the pace and the course each child takes while navigating through the syllabus. Bayesian nonparametric methods are known for automatically discovering the number of latent components and their parameters when the model involves higher complexity. We propose a nonparametric Bayesian matrix factorisation model that discovers learning patterns and the way participants associate with them. Our model is built upon the linear Poisson gamma model (LPGM) with an Indian buffet process prior and extended to incorporate data with missing elements. In this paper, for the first time we have presented learning patterns deduced automatically from data mining and machine learning methods using intervention data recorded for over 500 children with ASD. We compare the results with non-negative matrix factorisation and K-means, which being parametric, not only require us to specify the number of learning patterns in advance, but also do not have a principle approach to deal with missing data. The F1 score observed over varying degree of similarity measure (Jaccard Index) suggests that LPGM yields the best outcome. By observing these patterns with additional knowledge regarding the syllabus it may be possible to observe the progress and dynamically modify the syllabus for improved learning.
    @ARTICLE { vellanki_etal_kis16nonparametric,
        AUTHOR = { Vellanki, Pratibha and Duong, Thi and Gupta, Sunil and Venkatesh, Svetha and Phung, Dinh },
        TITLE = { Nonparametric discovery and analysis of learning patterns and autism subgroups from therapeutic data },
        JOURNAL = { Knowledge and Information Systems (KAIS) },
        YEAR = { 2016 },
        PAGES = { 1--31 },
        ISSN = { 0219-3116 },
        ABSTRACT = { The spectrum nature and heterogeneity within autism spectrum disorders (ASD) pose as a challenge for treatment. Personalisation of syllabus for children with ASD can improve the efficacy of learning by adjusting the number of opportunities and deciding the course of syllabus. We research the data-motivated approach in an attempt to disentangle this heterogeneity for personalisation of syllabus. With the help of technology and a structured syllabus, collecting data while a child with ASD masters the skills is made possible. The performance data collected are, however, growing and contain missing elements based on the pace and the course each child takes while navigating through the syllabus. Bayesian nonparametric methods are known for automatically discovering the number of latent components and their parameters when the model involves higher complexity. We propose a nonparametric Bayesian matrix factorisation model that discovers learning patterns and the way participants associate with them. Our model is built upon the linear Poisson gamma model (LPGM) with an Indian buffet process prior and extended to incorporate data with missing elements. In this paper, for the first time we have presented learning patterns deduced automatically from data mining and machine learning methods using intervention data recorded for over 500 children with ASD. We compare the results with non-negative matrix factorisation and K-means, which being parametric, not only require us to specify the number of learning patterns in advance, but also do not have a principle approach to deal with missing data. The F1 score observed over varying degree of similarity measure (Jaccard Index) suggests that LPGM yields the best outcome. By observing these patterns with additional knowledge regarding the syllabus it may be possible to observe the progress and dynamically modify the syllabus for improved learning. },
        DOI = { 10.1007/s10115-016-0971-7 },
        FILE = { :vellanki_etal_kis16nonparametric - Nonparametric Discovery and Analysis of Learning Patterns and Autism Subgroups from Therapeutic Data.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.08.02 },
        URL = { http://dx.doi.org/10.1007/s10115-016-0971-7 },
    }
J
  • Forecasting Daily Patient Outflow From a Ward Having No Real-Time Clinical Data
    Gopakumar, Shivapratap, Tran, Truyen, Luo, Wei, Phung, Dinh and Venkatesh, Svetha. JMIR Med Inform, 4(3):e25, Jul 2016. [ | | pdf]
    Objective: Our study investigates different models to forecast the total number of next-day discharges from an open ward having no real-time clinical data. Methods: We compared 5 popular regression algorithms to model total next-day discharges: (1) autoregressive integrated moving average (ARIMA), (2) the autoregressive moving average with exogenous variables (ARMAX), (3) k-nearest neighbor regression, (4) random forest regression, and (5) support vector regression. Although the autoregressive integrated moving average model relied on past 3-month discharges, nearest neighbor forecasting used median of similar discharges in the past in estimating next-day discharge. In addition, the ARMAX model used the day of the week and number of patients currently in ward as exogenous variables. For the random forest and support vector regression models, we designed a predictor set of 20 patient features and 88 ward-level features. Results: Our data consisted of 12,141 patient visits over 1826 days. Forecasting quality was measured using mean forecast error, mean absolute error, symmetric mean absolute percentage error, and root mean square error. When compared with a moving average prediction model, all 5 models demonstrated superior performance with the random forests achieving 22.7\% improvement in mean absolute error, for all days in the year 2014. Conclusions: In the absence of clinical information, our study recommends using patient-level and ward-level data in predicting next-day discharges. Random forest and support vector regression models are able to use all available features from such data, resulting in superior performance over traditional autoregressive methods. An intelligent estimate of available beds in wards plays a crucial role in relieving access block in emergency departments.
    @ARTICLE { gopakumar_etal_jmir16forecasting,
        AUTHOR = { Gopakumar, Shivapratap and Tran, Truyen and Luo, Wei and Phung, Dinh and Venkatesh, Svetha },
        JOURNAL = { JMIR Med Inform },
        TITLE = { Forecasting Daily Patient Outflow From a Ward Having No Real-Time Clinical Data },
        YEAR = { 2016 },
        MONTH = { Jul },
        NUMBER = { 3 },
        PAGES = { e25 },
        VOLUME = { 4 },
        ABSTRACT = { Objective: Our study investigates different models to forecast the total number of next-day discharges from an open ward having no real-time clinical data. Methods: We compared 5 popular regression algorithms to model total next-day discharges: (1) autoregressive integrated moving average (ARIMA), (2) the autoregressive moving average with exogenous variables (ARMAX), (3) k-nearest neighbor regression, (4) random forest regression, and (5) support vector regression. Although the autoregressive integrated moving average model relied on past 3-month discharges, nearest neighbor forecasting used median of similar discharges in the past in estimating next-day discharge. In addition, the ARMAX model used the day of the week and number of patients currently in ward as exogenous variables. For the random forest and support vector regression models, we designed a predictor set of 20 patient features and 88 ward-level features. Results: Our data consisted of 12,141 patient visits over 1826 days. Forecasting quality was measured using mean forecast error, mean absolute error, symmetric mean absolute percentage error, and root mean square error. When compared with a moving average prediction model, all 5 models demonstrated superior performance with the random forests achieving 22.7\% improvement in mean absolute error, for all days in the year 2014. Conclusions: In the absence of clinical information, our study recommends using patient-level and ward-level data in predicting next-day discharges. Random forest and support vector regression models are able to use all available features from such data, resulting in superior performance over traditional autoregressive methods. An intelligent estimate of available beds in wards plays a crucial role in relieving access block in emergency departments. },
        DAY = { 21 },
        DOI = { 10.2196/medinform.5650 },
        FILE = { :gopakumar_etal_jmir16forecasting - Forecasting Daily Patient Outflow from a Ward Having No Real Time Clinical Data.pdf:PDF },
        KEYWORDS = { patient flow },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.08.02 },
        URL = { http://medinform.jmir.org/2016/3/e25/ },
    }
J
  • Control Matching via Discharge Code Sequences
    Nguyen, Dang, Luo, Wei, Phung, Dinh and Venkatesh, Svetha. In Machine Learning for Health @ NIPS 2016, 2016. [ | ]
    In this paper, we consider the patient similarity matching problem over a cancer cohort of more than 220,000 patients. Our approach first leverages on Word2Vec framework to embed ICD codes into vector-valued representation. We then propose a sequential algorithm for case-control matching on this representation space of diagnosis codes. The novel practice of applying the sequential matching on the vector representation lifted the matching accuracy measured through multiple clinical outcomes. We reported the results on a large-scale dataset to demonstrate the effectiveness of our method. For such a large dataset where most clinical information has been codified, the new method is particularly relevant.
    @CONFERENCE { nguyen_etal_mlh16control,
        AUTHOR = { Nguyen, Dang and Luo, Wei and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { Control Matching via Discharge Code Sequences },
        BOOKTITLE = { Machine Learning for Health @ NIPS 2016 },
        YEAR = { 2016 },
        ABSTRACT = { In this paper, we consider the patient similarity matching problem over a cancer cohort of more than 220,000 patients. Our approach first leverages on Word2Vec framework to embed ICD codes into vector-valued representation. We then propose a sequential algorithm for case-control matching on this representation space of diagnosis codes. The novel practice of applying the sequential matching on the vector representation lifted the matching accuracy measured through multiple clinical outcomes. We reported the results on a large-scale dataset to demonstrate the effectiveness of our method. For such a large dataset where most clinical information has been codified, the new method is particularly relevant. },
        FILE = { :nguyen_etal_mlh16control - Control Matching Via Discharge Code Sequences.pdf:PDF },
        JOURNAL = { arXiv preprint arXiv:1612.01812 },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2017.02.06 },
    }
C
  • Effect of Social Capital on Emotion, Language Style and Latent Topics in Online Depression Community
    Dao, Bo, Nguyen, Thin, Venkatesh, Svetha and Phung, Dinh. In 12th IEEE-RIVF Intl. Conf. on Computing and Communication Technologies, Nov. 2016. (Best Runner-up Student Paper Award). [ | ]
    Social capital is linked to mental illness. It has been proposed that higher social capital is associated with better mental well-being in both individuals and groups in offline setting. However, in online settings, the association between onlinesocial capital and mental health conditions has not yet been explored. Social media offer us a rich opportunity to determine the link between social capital and aspects of mental wellbeing. In this paper, we examine social capital based on levelsof social connectivity of bloggers can be connected to aspects of depression in individuals and online depression community. We explore apparent properties of textual contents, including expressed emotions, language styles and latent topics, of a largecorpus of blog posts, to analyze the aspect of social capital in the community. Using data collected from online LiveJournal depression community, we apply both statistical tests and machine learning approaches to examine how predictive factors varybetween low and high social capital groups. Significant differences are found between low and high social capital groups when characterized by a set of latent topics, language features derived from blog posts, suggesting discriminative features, proved tobe useful in the classification task. This shows that linguistic styles are better predictors than latent topics as features. The findings indicate the potential of using social media as a sensor for monitoring mental well-being in online settings.
    @CONFERENCE { dao_etal_rivf16effect,
        AUTHOR = { Dao, Bo and Nguyen, Thin and Venkatesh, Svetha and Phung, Dinh },
        TITLE = { Effect of Social Capital on Emotion, Language Style and Latent Topics in Online Depression Community },
        BOOKTITLE = { 12th IEEE-RIVF Intl. Conf. on Computing and Communication Technologies },
        YEAR = { 2016 },
        MONTH = { Nov. },
        NOTE = { Best Runner-up Student Paper Award },
        ABSTRACT = { Social capital is linked to mental illness. It has been proposed that higher social capital is associated with better mental well-being in both individuals and groups in offline setting. However, in online settings, the association between onlinesocial capital and mental health conditions has not yet been explored. Social media offer us a rich opportunity to determine the link between social capital and aspects of mental wellbeing. In this paper, we examine social capital based on levelsof social connectivity of bloggers can be connected to aspects of depression in individuals and online depression community. We explore apparent properties of textual contents, including expressed emotions, language styles and latent topics, of a largecorpus of blog posts, to analyze the aspect of social capital in the community. Using data collected from online LiveJournal depression community, we apply both statistical tests and machine learning approaches to examine how predictive factors varybetween low and high social capital groups. Significant differences are found between low and high social capital groups when characterized by a set of latent topics, language features derived from blog posts, suggesting discriminative features, proved tobe useful in the classification task. This shows that linguistic styles are better predictors than latent topics as features. The findings indicate the potential of using social media as a sensor for monitoring mental well-being in online settings. },
        FILE = { :dao_etal_rivf16effect - Effect of Social Capital on Emotion, Language Style and Latent Topics in Online Depression Community.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.09.10 },
    }
C
  • MCNC: Multi-channel Nonparametric Clustering from Heterogeneous Data
    Nguyen, T-B., Nguyen, V., Venkatesh, S. and Phung, D.. In 23rd Intl. Conf. on Pattern Recognition (ICPR), Dec. 2016. (Finalist Best IBM Track 1 Student Paper Award). [ | ]
    Bayesian nonparametric (BNP) models have recently become popular due to its flexibility in identifying the unknown number of clusters. However, they have difficulties handling heterogeneous data from multiple sources. Existing BNP works either treat each of these sources independently -- hence do not benefit from the correlating information between them, or require to specify data sources as primary or context channels. In this paper, we present a BNP framework, termed MCNC, which has the ability to (1) discover co-patterns from multiple sources; (2) explore multi-channel data simultaneously and equally; (3) automatically identify a suitable number of patterns from data; and (4) handle missing data. The key idea is to tweak the base measure of a BNP model being a product-space. We demonstrate our framework on synthetic and real-world datasets to discover the identity--location--time (a.k.a who--where--when) patterns in two settings of complete and missing data. The experimenal results highlight the effectiveness of our MCNC in both cases of complete and missing data.
    @CONFERENCE { nguyen_nguyen_venkatesh_phung_icpr16mcnc,
        AUTHOR = { Nguyen, T-B. and Nguyen, V. and Venkatesh, S. and Phung, D. },
        TITLE = { {MCNC}: Multi-channel Nonparametric Clustering from Heterogeneous Data },
        BOOKTITLE = { 23rd Intl. Conf. on Pattern Recognition (ICPR) },
        YEAR = { 2016 },
        MONTH = { Dec. },
        NOTE = { Finalist Best IBM Track 1 Student Paper Award },
        ABSTRACT = { Bayesian nonparametric (BNP) models have recently become popular due to its flexibility in identifying the unknown number of clusters. However, they have difficulties handling heterogeneous data from multiple sources. Existing BNP works either treat each of these sources independently -- hence do not benefit from the correlating information between them, or require to specify data sources as primary or context channels. In this paper, we present a BNP framework, termed MCNC, which has the ability to (1) discover co-patterns from multiple sources; (2) explore multi-channel data simultaneously and equally; (3) automatically identify a suitable number of patterns from data; and (4) handle missing data. The key idea is to tweak the base measure of a BNP model being a product-space. We demonstrate our framework on synthetic and real-world datasets to discover the identity--location--time (a.k.a who--where--when) patterns in two settings of complete and missing data. The experimenal results highlight the effectiveness of our MCNC in both cases of complete and missing data. },
        FILE = { :nguyen_nguyen_venkatesh_phung_icpr16mcnc - MCNC_ Multi Channel Nonparametric Clustering from Heterogeneous Data.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.07.13 },
    }
C
  • Stable Clinical Prediction using Graph Support Vector Machines
    Kamkar, Iman, Gupta, Sunil, Li, Cheng, Phung, Dinh and Venkatesh, Svetha. In 23rd Intl. Conf. on Pattern Recognition (ICPR), Dec. 2016. [ | ]
    @CONFERENCE { kamkar_gupta_li_phung_venkatesh_icpr16stable,
        AUTHOR = { Kamkar, Iman and Gupta, Sunil and Li, Cheng and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { Stable Clinical Prediction using Graph {S}upport {V}ector {M}achines },
        BOOKTITLE = { 23rd Intl. Conf. on Pattern Recognition (ICPR) },
        YEAR = { 2016 },
        MONTH = { Dec. },
        FILE = { :kamkar_gupta_li_phung_venkatesh_icpr16stable - Stable Clinical Prediction Using Graph Support Vector Machines.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.07.13 },
    }
C
  • Distributed Data Augmented Support Vector Machine on Spark
    Nguyen, Tu, Nguyen, Vu, Le, Trung and Phung, Dinh. In 23rd Intl. Conf. on Pattern Recognition (ICPR), Dec. 2016. [ | ]
    @CONFERENCE { nguyen_nguyen_le_phung_icpr16distributed,
        AUTHOR = { Nguyen, Tu and Nguyen, Vu and Le, Trung and Phung, Dinh },
        TITLE = { Distributed Data Augmented {S}upport {V}ector {M}achine on {S}park },
        BOOKTITLE = { 23rd Intl. Conf. on Pattern Recognition (ICPR) },
        YEAR = { 2016 },
        MONTH = { Dec. },
        FILE = { :nguyen_nguyen_le_phung_icpr16distributed - Distributed Data Augmented Support Vector Machine on Spark.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.07.13 },
    }
C
  • Faster Training of Very Deep Networks via p-Norm Gates
    Pham, Trang, Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. In 23rd Intl. Conf. on Pattern Recognition (ICPR), Dec. 2016. [ | ]
    @CONFERENCE { pham_tran_phung_venkatesh_icpr16faster,
        AUTHOR = { Pham, Trang and Tran, Truyen and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { Faster Training of Very Deep Networks via p-Norm Gates },
        BOOKTITLE = { 23rd Intl. Conf. on Pattern Recognition (ICPR) },
        YEAR = { 2016 },
        MONTH = { Dec. },
        FILE = { :pham_tran_phung_venkatesh_icpr16faster - Faster Training of Very Deep Networks Via P Norm Gates.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.07.13 },
    }
C
  • Transfer Learning for Rare Cancer Problems via Discriminative Sparse Gaussian Graphical Model
    Saha, Budhaditya, Gupta, Sunil, Phung, Dinh and Venkatesh, Svetha. In 23rd Intl. Conf. on Pattern Recognition (ICPR), Dec. 2016. [ | ]
    @CONFERENCE { budhaditya_gupta_phung_venkatesh_icpr16transfer,
        AUTHOR = { Saha, Budhaditya and Gupta, Sunil and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { Transfer Learning for Rare Cancer Problems via Discriminative Sparse {G}aussian Graphical Model },
        BOOKTITLE = { 23rd Intl. Conf. on Pattern Recognition (ICPR) },
        YEAR = { 2016 },
        MONTH = { Dec. },
        FILE = { :budhaditya_gupta_phung_venkatesh_icpr16transfer - Transfer Learning for Rare Cancer Problems Via Discriminative Sparse Gaussian Graphical Model.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.07.13 },
    }
C
  • Model-based Classification and Novelty Detection For Point Pattern Data
    Vo, Ba-Ngu, Tran, Nhat-Quang, Phung, Dinh and Vo, Ba-Tuong. In 23rd Intl. Conf. on Pattern Recognition (ICPR), Dec. 2016. [ | ]
    @CONFERENCE { vo_tran_phung_vo_icpr16model,
        AUTHOR = { Vo, Ba-Ngu and Tran, Nhat-Quang and Phung, Dinh and Vo, Ba-Tuong },
        TITLE = { Model-based Classification and Novelty Detection For Point Pattern Data },
        BOOKTITLE = { 23rd Intl. Conf. on Pattern Recognition (ICPR) },
        YEAR = { 2016 },
        MONTH = { Dec. },
        FILE = { :vo_tran_phung_vo_icpr16model - Model Based Classification and Novelty Detection for Point Pattern Data.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.07.13 },
    }
C
  • Clustering For Point Pattern Data
    Tran, Nhat-Quang, Vo, Ba-Ngu, Phung, Dinh and Vo, Ba-Tuong. In 23rd Intl. Conf. on Pattern Recognition (ICPR), Dec. 2016. [ | ]
    @CONFERENCE { tran_vo_phung_vo_icpr16clustering,
        AUTHOR = { Tran, Nhat-Quang and Vo, Ba-Ngu and Phung, Dinh and Vo, Ba-Tuong },
        TITLE = { Clustering For Point Pattern Data },
        BOOKTITLE = { 23rd Intl. Conf. on Pattern Recognition (ICPR) },
        YEAR = { 2016 },
        MONTH = { Dec. },
        FILE = { :tran_vo_phung_vo_icpr16clustering - Clustering for Point Pattern Data.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.07.13 },
    }
C
  • Discriminative cues for different stages of smoking cessation in online community
    Nguyen, Thin, Borland, Ron, Yearwood, John, Yong, Hua, Venkatesh, Svetha and Phung, Dinh. In 17th Intl. Conf. on Web Information Systems Engineering (WISE), Nov. 2016. [ | ]
    Smoking is the largest single cause of premature mortality, being responsible for about six million deaths annually worldwide. Most smokers want to quit, but many have problems. The Internet enables people interested in quitting smoking to connect with others via online communities; however, the characteristics of these discussions are not well understood. This work aims to explore the textual cues of an online community interested in quitting smoking: www.reddit.com/r/stopsmoking -- “a place for redditors to motivate each other to quit smoking”. A large corpus of data was crawled including thousand posts made by thousand users within the community. Four subgroups of posts based on the cessation days of abstainers were defined: S0: within the first week, S1: within the first month (excluding cohort S0), S2: from second month to one year, and S3: beyond one year. Psycho-linguistic features and content topics were extracted from the posts and analysed. Machine learning techniques were used to discriminate the online conversations in the first week S0 from the other subgroups. Topics and psycho-linguistic features were found to be highly valid predictors of the subgroups. Clear discrimination between linguistic features and topics, alongside good predictive power is an important step in understanding social media and its use in studies of smoking and other addictions in online settings.
    @INPROCEEDINGS { nguyen_etal_wise16discriminative,
        AUTHOR = { Nguyen, Thin and Borland, Ron and Yearwood, John and Yong, Hua and Venkatesh, Svetha and Phung, Dinh },
        TITLE = { Discriminative cues for different stages of smoking cessation in online community },
        BOOKTITLE = { 17th Intl. Conf. on Web Information Systems Engineering (WISE) },
        YEAR = { 2016 },
        SERIES = { Lecture Notes in Computer Science },
        MONTH = { Nov. },
        PUBLISHER = { Springer International Publishing },
        ABSTRACT = { Smoking is the largest single cause of premature mortality, being responsible for about six million deaths annually worldwide. Most smokers want to quit, but many have problems. The Internet enables people interested in quitting smoking to connect with others via online communities; however, the characteristics of these discussions are not well understood. This work aims to explore the textual cues of an online community interested in quitting smoking: www.reddit.com/r/stopsmoking -- “a place for redditors to motivate each other to quit smoking”. A large corpus of data was crawled including thousand posts made by thousand users within the community. Four subgroups of posts based on the cessation days of abstainers were defined: S0: within the first week, S1: within the first month (excluding cohort S0), S2: from second month to one year, and S3: beyond one year. Psycho-linguistic features and content topics were extracted from the posts and analysed. Machine learning techniques were used to discriminate the online conversations in the first week S0 from the other subgroups. Topics and psycho-linguistic features were found to be highly valid predictors of the subgroups. Clear discrimination between linguistic features and topics, alongside good predictive power is an important step in understanding social media and its use in studies of smoking and other addictions in online settings. },
        FILE = { :nguyen_etal_wise16discriminative - Discriminative Cues for Different Stages of Smoking Cessation in Online Community.pdf:PDF },
        LANGUAGE = { English },
        OWNER = { thinng },
        TIMESTAMP = { 2016.07.14 },
    }
C
  • Large-scale stylistic analysis of formality in academia and social media
    Nguyen, Thin, Venkatesh, Svetha and Phung, Dinh. In 17th Intl. Conf. on Web Information Systems Engineering (WISE), Nov. 2016. [ | ]
    The dictum `publish or perish' has influenced the way scientists present research results as to get published, including exaggeration and overstatement of research findings. This behavior emerges patterns of using language in academia. For example, recently it has been found that the proportion of positive words has risen in the content of scientific articles over the last 40 years, which probably shows the tendency in scientists to exaggerate and overstate their research results. The practice may deviate from impersonal and formal style of academic writing. In this study the degree of formality in scientific articles is investigated through a corpus of 14 million PubMed abstracts. Three aspects of stylistic features are explored: expressing emotional information, using first person pronouns to refer to the authors, and mixing English varieties. The aspects are compared with that of online user-generated media, including online encyclopedias, web-logs, forums, and micro-blogs. Trends of these stylistic features in scientific publications for the last four decades are also discovered. Advances in cluster computing are employed to process large scale data, with 5.8 terabytes and 3.6 billions of data points from all the media. The results suggest the potential of pattern recognition in data at scale.
    @INPROCEEDINGS { nguyen_etal_wise16LargeScale,
        AUTHOR = { Nguyen, Thin and Venkatesh, Svetha and Phung, Dinh },
        TITLE = { Large-scale stylistic analysis of formality in academia and social media },
        BOOKTITLE = { 17th Intl. Conf. on Web Information Systems Engineering (WISE) },
        YEAR = { 2016 },
        SERIES = { Lecture Notes in Computer Science },
        MONTH = { Nov. },
        PUBLISHER = { Springer International Publishing },
        ABSTRACT = { The dictum `publish or perish' has influenced the way scientists present research results as to get published, including exaggeration and overstatement of research findings. This behavior emerges patterns of using language in academia. For example, recently it has been found that the proportion of positive words has risen in the content of scientific articles over the last 40 years, which probably shows the tendency in scientists to exaggerate and overstate their research results. The practice may deviate from impersonal and formal style of academic writing. In this study the degree of formality in scientific articles is investigated through a corpus of 14 million PubMed abstracts. Three aspects of stylistic features are explored: expressing emotional information, using first person pronouns to refer to the authors, and mixing English varieties. The aspects are compared with that of online user-generated media, including online encyclopedias, web-logs, forums, and micro-blogs. Trends of these stylistic features in scientific publications for the last four decades are also discovered. Advances in cluster computing are employed to process large scale data, with 5.8 terabytes and 3.6 billions of data points from all the media. The results suggest the potential of pattern recognition in data at scale. },
        FILE = { :nguyen_etal_wise16LargeScale - Large Scale Stylistic Analysis of Formality in Academia and Social Media.pdf:PDF },
        LANGUAGE = { English },
        OWNER = { thinng },
        TIMESTAMP = { 2016.07.14 },
    }
C
  • Learning Multifaceted Latent Activities from Heterogeneous Mobile Data
    Nguyen, Thanh-Binh, Nguyen, Vu, Nguyen, Thuong, Venkatesh, Svetha, Kumar, Mohan and Phung, Dinh. In 3rd Intl. Conf. on Data Science and Advanced Analytics (DSAA), Oct. 2016. [ | ]
    Inferring abstract contexts and activities from heterogeneous data is vital to context-aware ubiquitous applications but still remains one of the most challenging problems. Recent advances in Bayesian nonparametric machine learning, in particular the theory of topic models based on Hierarchical Dirichlet Process (HDP), has provided an elegant solution towards these challenges. However, none of existing methods has addressed the problem of inferring latent multifaceted activities and contexts from heterogeneous data sources such as those collected from mobile devices. In this paper, we extend the original HDP to model heterogeneous data using a richer structure of the base measure being a product-space. The proposed model, called product-space HDP (PS-HDP), naturally handles the heterogeneous data from multiple sources and identify the unknown number of latent structures in a principle way. Although this framework is generic, our current work primarily focuses on inferring (latent) threefold activities of who-when-where simultaneously, which corresponds to inducing activities from data collected for identity, location and time. We demonstrate our model on synthetic data as well as on a real-world dataset – the StudenfLife dataset. We report results and provide analysis on the discovered activities and patterns to demonstrate the merit of the model. We also quantitatively evaluate the performance of PS-HDP model using standard metrics including F1-score, NMI, RI, purity, and compare them with well-known existing baseline methods.
    @INPROCEEDINGS { nguyen_etal_dsaa16learning,
        AUTHOR = { Nguyen, Thanh-Binh and Nguyen, Vu and Nguyen, Thuong and Venkatesh, Svetha and Kumar, Mohan and Phung, Dinh },
        TITLE = { Learning Multifaceted Latent Activities from Heterogeneous Mobile Data },
        BOOKTITLE = { 3rd Intl. Conf. on Data Science and Advanced Analytics (DSAA) },
        YEAR = { 2016 },
        MONTH = { Oct. },
        ABSTRACT = { Inferring abstract contexts and activities from heterogeneous data is vital to context-aware ubiquitous applications but still remains one of the most challenging problems. Recent advances in Bayesian nonparametric machine learning, in particular the theory of topic models based on Hierarchical Dirichlet Process (HDP), has provided an elegant solution towards these challenges. However, none of existing methods has addressed the problem of inferring latent multifaceted activities and contexts from heterogeneous data sources such as those collected from mobile devices. In this paper, we extend the original HDP to model heterogeneous data using a richer structure of the base measure being a product-space. The proposed model, called product-space HDP (PS-HDP), naturally handles the heterogeneous data from multiple sources and identify the unknown number of latent structures in a principle way. Although this framework is generic, our current work primarily focuses on inferring (latent) threefold activities of who-when-where simultaneously, which corresponds to inducing activities from data collected for identity, location and time. We demonstrate our model on synthetic data as well as on a real-world dataset – the StudenfLife dataset. We report results and provide analysis on the discovered activities and patterns to demonstrate the merit of the model. We also quantitatively evaluate the performance of PS-HDP model using standard metrics including F1-score, NMI, RI, purity, and compare them with well-known existing baseline methods. },
        FILE = { :nguyen_etal_dsaa16learning - Learning Multifaceted Latent Activities from Heterogeneous Mobile Data.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.08.01 },
    }
C
  • Analysing the History of Autism Spectrum Disorder using Topic Models
    Beykikhoshk, Adham, Arandjelovi\'{c}, Ognjen, Venkatesh, Svetha and Phung, Dinh. In 3rd Intl. Conf. on Data Science and Advanced Analytics (DSAA), Oct. 2016. [ | ]
    We describe a novel framework for the discovery of underlying topics of a longitudinal collection of scholarly data, and the tracking of their lifetime and popularity over time. Unlike the social media or news data, as the topic nuances in science result in new scientific directions to emerge, a new approach to model the longitudinal literature data is using topics which remain identifiable over the course of time. Current studies either disregard the time dimension or treat it as an exchangeable covariate when they fix the topics over time or do not share the topics over epochs when they model the time naturally. We address these issues by adopting a non-parametric Bayesian approach. We assume the data is partially exchangeable and divide it into consecutive epochs. Then, by fixing the topics in a recurrent Chinese restaurant franchise, we impose a static topical structure on the corpus such that the they are shared across epochs and the documents within epochs. We demonstrate the effectiveness of the proposed framework on a collection of medical literature related to autism spectrum disorder. We collect a large corpus of publications and carefully examining two important research issues of the domain as case studies. Moreover, we make the results of our experiment and the source code of the model, freely available to aid other researchers by analysing the results or applying the model to their data collections.
    @INPROCEEDINGS { beykikhoshk_etal_dsaa16analysing,
        AUTHOR = { Beykikhoshk, Adham and Arandjelovi\'{c}, Ognjen and Venkatesh, Svetha and Phung, Dinh },
        TITLE = { Analysing the History of Autism Spectrum Disorder using Topic Models },
        BOOKTITLE = { 3rd Intl. Conf. on Data Science and Advanced Analytics (DSAA) },
        YEAR = { 2016 },
        MONTH = { Oct. },
        ABSTRACT = { We describe a novel framework for the discovery of underlying topics of a longitudinal collection of scholarly data, and the tracking of their lifetime and popularity over time. Unlike the social media or news data, as the topic nuances in science result in new scientific directions to emerge, a new approach to model the longitudinal literature data is using topics which remain identifiable over the course of time. Current studies either disregard the time dimension or treat it as an exchangeable covariate when they fix the topics over time or do not share the topics over epochs when they model the time naturally. We address these issues by adopting a non-parametric Bayesian approach. We assume the data is partially exchangeable and divide it into consecutive epochs. Then, by fixing the topics in a recurrent Chinese restaurant franchise, we impose a static topical structure on the corpus such that the they are shared across epochs and the documents within epochs. We demonstrate the effectiveness of the proposed framework on a collection of medical literature related to autism spectrum disorder. We collect a large corpus of publications and carefully examining two important research issues of the domain as case studies. Moreover, we make the results of our experiment and the source code of the model, freely available to aid other researchers by analysing the results or applying the model to their data collections. },
        FILE = { :beykikhoshk_etal_dsaa16analysing - Analysing the History of Autism Spectrum Disorder Using Topic Models.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.08.01 },
    }
C
  • A Framework for Mixed-type Multi-outcome Prediction with Applications in Healthcare
    Saha, Budhaditya, Gupta, Sunil, Phung, Dinh and Venkatesh, Svetha. IEEE Journal of Biomedical and Health Informatics (JBHI), July 2016. [ | ]
    @ARTICLE { budhaditya_gupta_phung_venkatesh_jbhi16framework,
        AUTHOR = { Saha, Budhaditya and Gupta, Sunil and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { A Framework for Mixed-type Multi-outcome Prediction with Applications in Healthcare },
        JOURNAL = { IEEE Journal of Biomedical and Health Informatics (JBHI) },
        YEAR = { 2016 },
        MONTH = { July },
        FILE = { :budhaditya_gupta_phung_venkatesh_jbhi16framework - A Framework for Mixed Type Multi Outcome Prediction with Applications in Healthcare.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.07.13 },
    }
J
  • Discovering Latent Affective Transitions among Individuals in Online Mental Health­related Communities.
    Dao, Bo, Thin Nguyen, Venkatesh, Svetha and Phung, Dinh. In IEEE Intl. Conf. on Multimedia and Expo (ICME), Seatle, USA, July 2016. [ | ]
    The discovery of latent affective patterns of individuals with affective disorders will potentially enhance the diagnosis and treatment of mental disorders. This paper studies the phenomena of affective transitions among individuals in online mental health communities. We apply non-negative matrix factorization model to extract the common and individual factors of affective transitions across groups of individuals in different levels of affective disorders. We examine the latent patterns of emotional transitions and investigate the effects of emotional transitions across the cohorts. We establish a novel framework of utilizing social media as sensors of mood and emotional transitions. This work might suggest the base of new systems to screen individuals and communities at high risks of mental health problems in online settings.
    @INPROCEEDINGS { dao_nguyen_venkatesh_phung_icme16,
        AUTHOR = { Dao, Bo and Thin Nguyen and Venkatesh, Svetha and Phung, Dinh },
        TITLE = { Discovering Latent Affective Transitions among Individuals in Online Mental Health­related Communities. },
        BOOKTITLE = { IEEE Intl. Conf. on Multimedia and Expo (ICME) },
        YEAR = { 2016 },
        ADDRESS = { Seatle, USA },
        MONTH = { July },
        PUBLISHER = { IEEE },
        ABSTRACT = { The discovery of latent affective patterns of individuals with affective disorders will potentially enhance the diagnosis and treatment of mental disorders. This paper studies the phenomena of affective transitions among individuals in online mental health communities. We apply non-negative matrix factorization model to extract the common and individual factors of affective transitions across groups of individuals in different levels of affective disorders. We examine the latent patterns of emotional transitions and investigate the effects of emotional transitions across the cohorts. We establish a novel framework of utilizing social media as sensors of mood and emotional transitions. This work might suggest the base of new systems to screen individuals and communities at high risks of mental health problems in online settings. },
        FILE = { :dao_nguyen_venkatesh_phung_icme16 - Discovering Latent Affective Transitions among Individuals in Online Mental Health­related Communities..pdf:PDF },
        OWNER = { dbdao },
        TIMESTAMP = { 2016.03.20 },
    }
C
  • Hierarchical Bayesian nonparametric models for knowledge discovery from electronic medical records
    Li, Cheng, Rana, Santu, Phung, Dinh and Venkatesh, Svetha. Knowledge-Based Systems (KBS), 99(1):168 - 182, May 2016. [ | | pdf]
    Electronic Medical Record (EMR) has established itself as a valuable resource for large scale analysis of health data. A hospital \{EMR\} dataset typically consists of medical records of hospitalized patients. A medical record contains diagnostic information (diagnosis codes), procedures performed (procedure codes) and admission details. Traditional topic models, such as latent Dirichlet allocation (LDA) and hierarchical Dirichlet process (HDP), can be employed to discover disease topics from \{EMR\} data by treating patients as documents and diagnosis codes as words. This topic modeling helps to understand the constitution of patient diseases and offers a tool for better planning of treatment. In this paper, we propose a novel and flexible hierarchical Bayesian nonparametric model, the word distance dependent Chinese restaurant franchise (wddCRF), which incorporates word-to-word distances to discover semantically-coherent disease topics. We are motivated by the fact that diagnosis codes are connected in the form of ICD-10 tree structure which presents semantic relationships between codes. We exploit a decay function to incorporate distances between words at the bottom level of wddCRF. Efficient inference is derived for the wddCRF by using \{MCMC\} technique. Furthermore, since procedure codes are often correlated with diagnosis codes, we develop the correspondence wddCRF (Corr-wddCRF) to explore conditional relationships of procedure codes for a given disease pattern. Efficient collapsed Gibbs sampling is derived for the Corr-wddCRF. We evaluate the proposed models on two real-world medical datasets – PolyVascular disease and Acute Myocardial Infarction disease. We demonstrate that the Corr-wddCRF model discovers more coherent topics than the Corr-HDP. We also use disease topic proportions as new features and show that using features from the Corr-wddCRF outperforms the baselines on 14-days readmission prediction. Beside these, the prediction for procedure codes based on the Corr-wddCRF also shows considerable accuracy.
    @ARTICLE { li_rana_phung_venkatesh_kbs16hierarchical,
        AUTHOR = { Li, Cheng and Rana, Santu and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { Hierarchical {B}ayesian nonparametric models for knowledge discovery from electronic medical records },
        JOURNAL = { Knowledge-Based Systems (KBS) },
        YEAR = { 2016 },
        VOLUME = { 99 },
        NUMBER = { 1 },
        PAGES = { 168 - 182 },
        MONTH = { May },
        ISSN = { 0950-7051 },
        ABSTRACT = { Electronic Medical Record (EMR) has established itself as a valuable resource for large scale analysis of health data. A hospital \{EMR\} dataset typically consists of medical records of hospitalized patients. A medical record contains diagnostic information (diagnosis codes), procedures performed (procedure codes) and admission details. Traditional topic models, such as latent Dirichlet allocation (LDA) and hierarchical Dirichlet process (HDP), can be employed to discover disease topics from \{EMR\} data by treating patients as documents and diagnosis codes as words. This topic modeling helps to understand the constitution of patient diseases and offers a tool for better planning of treatment. In this paper, we propose a novel and flexible hierarchical Bayesian nonparametric model, the word distance dependent Chinese restaurant franchise (wddCRF), which incorporates word-to-word distances to discover semantically-coherent disease topics. We are motivated by the fact that diagnosis codes are connected in the form of ICD-10 tree structure which presents semantic relationships between codes. We exploit a decay function to incorporate distances between words at the bottom level of wddCRF. Efficient inference is derived for the wddCRF by using \{MCMC\} technique. Furthermore, since procedure codes are often correlated with diagnosis codes, we develop the correspondence wddCRF (Corr-wddCRF) to explore conditional relationships of procedure codes for a given disease pattern. Efficient collapsed Gibbs sampling is derived for the Corr-wddCRF. We evaluate the proposed models on two real-world medical datasets – PolyVascular disease and Acute Myocardial Infarction disease. We demonstrate that the Corr-wddCRF model discovers more coherent topics than the Corr-HDP. We also use disease topic proportions as new features and show that using features from the Corr-wddCRF outperforms the baselines on 14-days readmission prediction. Beside these, the prediction for procedure codes based on the Corr-wddCRF also shows considerable accuracy. },
        DOI = { http://dx.doi.org/10.1016/j.knosys.2016.02.005 },
        FILE = { :li_rana_phung_venkatesh_kbs16hierarchical - Hierarchical Bayesian Nonparametric Models for Knowledge Discovery from Electronic Medical Records.pdf:PDF },
        KEYWORDS = { Bayesian nonparametric models; Correspondence models; Word distances; Disease topics; Readmission prediction; Procedure codes prediction },
        URL = { http://www.sciencedirect.com/science/article/pii/S0950705116000836 },
    }
J
  • Learning Multi-faceted Activities from Heterogeneous Data with the Product Space Hierarchical Dirichlet Processes
    Nguyen, T-B., Nguyen, V., Venkatesh, S. and Phung, D.. In 3rd PAKDD Workshop on Machine Learning for Sensory Data Analysis (MLSDA), pages 128-140, April 2016. [ | ]
    Hierarchical Dirichlet processes (HDP) was originally designed and experimented for a single data channel. In this paper we enhanced its ability to model heterogeneous data using a richer structure for the base measure being a product-space. The enhanced model, called Product Space HDP (PS-HDP), can (1) simultaneously model heterogeneous data from multiple sources in a Bayesian nonparametric framework, hence inherit its strengths and advantages including the ability to automatically grow the model complexity and (2) discover multilevel latent structures from data to result in different types of topics/latent structures that can be explained jointly. We experimented with the MDC dataset, a large and real-world data collected from mobile phones. Our goal was to discover identity--location--time (a.k.a who-where-when) patterns at different levels (globally for all groups and locally for each group). We provided analysis on the activities and patterns learned from our model, visualized, compared and contrasted with the ground-truth to demonstrate the merit of the proposed framework. We further quantitatively evaluated and reported its performance using standard metrics including F1-score, NMI, RI, and purity. We also compared the performance of the PS-HDP model with those of popular existing clustering methods (including K-Means, NNMF, GMM, DP-Means, and AP). Lastly, we demonstrate the ability of the model in learning activities with missing data, a common problem encountered in pervasive and ubiquitous computing applications.
    @INPROCEEDINGS { nguyen_nguyen_venkatesh_phung_mlsda16learning,
        AUTHOR = { Nguyen, T-B. and Nguyen, V. and Venkatesh, S. and Phung, D. },
        TITLE = { Learning Multi-faceted Activities from Heterogeneous Data with the Product Space Hierarchical {D}irichlet Processes },
        BOOKTITLE = { 3rd PAKDD Workshop on Machine Learning for Sensory Data Analysis (MLSDA) },
        YEAR = { 2016 },
        PAGES = { 128--140 },
        MONTH = { April },
        ABSTRACT = { Hierarchical Dirichlet processes (HDP) was originally designed and experimented for a single data channel. In this paper we enhanced its ability to model heterogeneous data using a richer structure for the base measure being a product-space. The enhanced model, called Product Space HDP (PS-HDP), can (1) simultaneously model heterogeneous data from multiple sources in a Bayesian nonparametric framework, hence inherit its strengths and advantages including the ability to automatically grow the model complexity and (2) discover multilevel latent structures from data to result in different types of topics/latent structures that can be explained jointly. We experimented with the MDC dataset, a large and real-world data collected from mobile phones. Our goal was to discover identity--location--time (a.k.a who-where-when) patterns at different levels (globally for all groups and locally for each group). We provided analysis on the activities and patterns learned from our model, visualized, compared and contrasted with the ground-truth to demonstrate the merit of the proposed framework. We further quantitatively evaluated and reported its performance using standard metrics including F1-score, NMI, RI, and purity. We also compared the performance of the PS-HDP model with those of popular existing clustering methods (including K-Means, NNMF, GMM, DP-Means, and AP). Lastly, we demonstrate the ability of the model in learning activities with missing data, a common problem encountered in pervasive and ubiquitous computing applications. },
        FILE = { :nguyen_nguyen_venkatesh_phung_mlsda16learning - Learning Multi Faceted Activities from Heterogeneous Data with the Product Space Hierarchical Dirichlet Processes.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.04.06 },
    }
C
  • Neural Choice by Elimination via Highway Networks
    Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. In 5th PAKDD Workshop on Biologically Inspired Data Mining Techniques, April 2016. [ | ]
    We introduce Neural Choice by Elimination, a new framework that integrates deep neural networks into probabilistic sequential choice models for learning to rank. Given a set of items to chose from, the elimination strategy starts with the whole item set and iteratively eliminates the least worthy item in the remaining subset. We prove that the choice by elimination is equivalent to marginalizing out the random Gompertz latent utilities. Coupled with the choice model is the recently introduced Neural Highway Networks for approximating arbitrarily complex rank functions. We evaluate the proposed framework on a large-scale public dataset with over 425K items, drawn from the Yahoo! learning to rank challenge. It is demonstrated that the proposed method is competitive against state-of-the-art learning to rank methods.
    @INPROCEEDINGS { tran_phung_venkatesh_bmd16neural,
        AUTHOR = { Tran, Truyen and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { Neural Choice by Elimination via Highway Networks },
        BOOKTITLE = { 5th PAKDD Workshop on Biologically Inspired Data Mining Techniques },
        YEAR = { 2016 },
        MONTH = { April },
        ABSTRACT = { We introduce Neural Choice by Elimination, a new framework that integrates deep neural networks into probabilistic sequential choice models for learning to rank. Given a set of items to chose from, the elimination strategy starts with the whole item set and iteratively eliminates the least worthy item in the remaining subset. We prove that the choice by elimination is equivalent to marginalizing out the random Gompertz latent utilities. Coupled with the choice model is the recently introduced Neural Highway Networks for approximating arbitrarily complex rank functions. We evaluate the proposed framework on a large-scale public dataset with over 425K items, drawn from the Yahoo! learning to rank challenge. It is demonstrated that the proposed method is competitive against state-of-the-art learning to rank methods. },
        FILE = { :tran_phung_venkatesh_bmd16neural - Neural Choice by Elimination Via Highway Networks.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.04.06 },
    }
C
  • DeepCare: A Deep Dynamic Memory Model for Predictive Medicine
    Pham, Trang, Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. In Pacific Asia Knowledge Discovery and Data Mining Conference (PAKDD), pages 30-41, April 2016. [ | | pdf]
    Personalized predictive medicine necessitates modeling of patient illness and care processes, which inherently have long-term temporal dependencies. Healthcare observations, recorded in electronic medical records, are episodic and irregular in time. We introduce DeepCare, a deep dynamic neural network that reads medical records and predicts future medical outcomes. At the data level, DeepCare models patient health state trajectories with explicit memory of illness. Built on Long Short-Term Memory (LSTM), DeepCare introduces time parameterizations to handle irregular timing by moderating the forgetting and consolidation of illness memory. DeepCare also incorporates medical interventions that change the course of illness and shape future medical risk. Moving up to the health state level, historical and present health states are then aggregated through multiscale temporal pooling, before passing through a neural network that estimates future outcomes. We demonstrate the efficacy of DeepCare for disease progression modeling and readmission prediction in diabetes, a chronic disease with large economic burden. The results show improved modeling and risk prediction accuracy.
    @CONFERENCE { pham_tran_phung_venkatesh_pakdd16deepcare,
        AUTHOR = { Pham, Trang and Tran, Truyen and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { {DeepCare}: A Deep Dynamic Memory Model for Predictive Medicine },
        BOOKTITLE = { Pacific Asia Knowledge Discovery and Data Mining Conference (PAKDD) },
        YEAR = { 2016 },
        VOLUME = { 9652 },
        SERIES = { Lecture Notes in Computer Science },
        PAGES = { 30--41 },
        MONTH = { April },
        PUBLISHER = { Springer International Publishing },
        ABSTRACT = { Personalized predictive medicine necessitates modeling of patient illness and care processes, which inherently have long-term temporal dependencies. Healthcare observations, recorded in electronic medical records, are episodic and irregular in time. We introduce DeepCare, a deep dynamic neural network that reads medical records and predicts future medical outcomes. At the data level, DeepCare models patient health state trajectories with explicit memory of illness. Built on Long Short-Term Memory (LSTM), DeepCare introduces time parameterizations to handle irregular timing by moderating the forgetting and consolidation of illness memory. DeepCare also incorporates medical interventions that change the course of illness and shape future medical risk. Moving up to the health state level, historical and present health states are then aggregated through multiscale temporal pooling, before passing through a neural network that estimates future outcomes. We demonstrate the efficacy of DeepCare for disease progression modeling and readmission prediction in diabetes, a chronic disease with large economic burden. The results show improved modeling and risk prediction accuracy. },
        DOI = { 10.1007/978-3-319-31750-2_3 },
        FILE = { :pham_tran_phung_venkatesh_pakdd16deepcare - DeepCare_ a Deep Dynamic Memory Model for Predictive Medicine.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.04.06 },
        URL = { http://link.springer.com/chapter/10.1007/978-3-319-31750-2_3 },
    }
C
  • Sparse Adaptive Multi-Hyperplane Machine
    Nguyen, Khanh, Le, Trung, Nguyen, Vu and Phung, Dinh. In Pacific Asia Knowledge Discovery and Data Mining Conference (PAKDD), pages 27-39, April 2016. [ | | pdf]
    The Adaptive Multiple-hyperplane Machine (AMM) was recently proposed to deal with large-scale datasets. However, it has no principle to tune the complexity and sparsity levels of the solution. Addressing the sparsity is important to improve learning generalization, prediction accuracy and computational speedup. In this paper, we employ the max-margin principle and sparse approach to propose a new Sparse AMM (SAMM). We solve the new optimization objective function with stochastic gradient descent (SGD). Besides inheriting the good features of SGD-based learning method and the original AMM, our proposed Sparse AMM provides machinery and flexibility to tune the complexity and sparsity of the solution, making it possible to avoid overfitting and underfitting. We validate our approach on several large benchmark datasets. We show that with the ability to control sparsity, the proposed Sparse AMM yields superior classification accuracy to the original AMM while simultaneously achieving computational speedup.
    @CONFERENCE { nguyen_le_nguyen_phung_pakdd16sparse,
        AUTHOR = { Nguyen, Khanh and Le, Trung and Nguyen, Vu and Phung, Dinh },
        TITLE = { Sparse Adaptive Multi-Hyperplane Machine },
        BOOKTITLE = { Pacific Asia Knowledge Discovery and Data Mining Conference (PAKDD) },
        YEAR = { 2016 },
        VOLUME = { 9651 },
        SERIES = { Lecture Notes in Computer Science },
        PAGES = { 27--39 },
        MONTH = { April },
        PUBLISHER = { Springer International Publishing },
        ABSTRACT = { The Adaptive Multiple-hyperplane Machine (AMM) was recently proposed to deal with large-scale datasets. However, it has no principle to tune the complexity and sparsity levels of the solution. Addressing the sparsity is important to improve learning generalization, prediction accuracy and computational speedup. In this paper, we employ the max-margin principle and sparse approach to propose a new Sparse AMM (SAMM). We solve the new optimization objective function with stochastic gradient descent (SGD). Besides inheriting the good features of SGD-based learning method and the original AMM, our proposed Sparse AMM provides machinery and flexibility to tune the complexity and sparsity of the solution, making it possible to avoid overfitting and underfitting. We validate our approach on several large benchmark datasets. We show that with the ability to control sparsity, the proposed Sparse AMM yields superior classification accuracy to the original AMM while simultaneously achieving computational speedup. },
        DOI = { 10.1007/978-3-319-31753-3_3 },
        FILE = { :nguyen_le_nguyen_phung_pakdd16sparse - Sparse Adaptive Multi Hyperplane Machine.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.04.06 },
        URL = { http://link.springer.com/chapter/10.1007/978-3-319-31753-3_3 },
    }
C
  • Toxicity Prediction in Cancer Using Multiple Instance Learning in a Multi-Task Framework
    Li, Cheng, Gupta, Sunil, Rana, Santu, Luo, Wei, Venkatesh, Svetha, Ashely, David and Phung, Dinh. In Pacific Asia Knowledge Discovery and Data Mining Conference (PAKDD), pages 152-164, April 2016. [ | | pdf]
    Treatments of cancer cause severe side effects called toxicities. Reduction of such effects is crucial in cancer care. To impact care, we need to predict toxicities at fortnightly intervals. This toxicity data differs from traditional time series data as toxicities can be caused by one treatment on a given day alone, and thus it is necessary to consider the effect of the singular data vector causing toxicity. We model the data before prediction points using the multiple instance learning, where each bag is composed of multiple instances associated with daily treatments and patient-specific attributes, such as chemotherapy, radiotherapy, age and cancer types. We then formulate a Bayesian multi-task framework to enhance toxicity prediction at each prediction point. The use of the prior allows factors to be shared across task predictors. Our proposed method simultaneously captures the heterogeneity of daily treatments and performs toxicity prediction at different prediction points. Our method was evaluated on a real-word dataset of more than 2000 cancer patients and had achieved a better prediction accuracy in terms of AUC than the state-of-art baselines.
    @CONFERENCE { li_gupta_rana_luo_venkatesh_ashley_phung_pakdd16toxicity,
        AUTHOR = { Li, Cheng and Gupta, Sunil and Rana, Santu and Luo, Wei and Venkatesh, Svetha and Ashely, David and Phung, Dinh },
        TITLE = { Toxicity Prediction in Cancer Using Multiple Instance Learning in a Multi-Task Framework },
        BOOKTITLE = { Pacific Asia Knowledge Discovery and Data Mining Conference (PAKDD) },
        YEAR = { 2016 },
        PAGES = { 152--164 },
        MONTH = { April },
        PUBLISHER = { Springer },
        ABSTRACT = { Treatments of cancer cause severe side effects called toxicities. Reduction of such effects is crucial in cancer care. To impact care, we need to predict toxicities at fortnightly intervals. This toxicity data differs from traditional time series data as toxicities can be caused by one treatment on a given day alone, and thus it is necessary to consider the effect of the singular data vector causing toxicity. We model the data before prediction points using the multiple instance learning, where each bag is composed of multiple instances associated with daily treatments and patient-specific attributes, such as chemotherapy, radiotherapy, age and cancer types. We then formulate a Bayesian multi-task framework to enhance toxicity prediction at each prediction point. The use of the prior allows factors to be shared across task predictors. Our proposed method simultaneously captures the heterogeneity of daily treatments and performs toxicity prediction at different prediction points. Our method was evaluated on a real-word dataset of more than 2000 cancer patients and had achieved a better prediction accuracy in terms of AUC than the state-of-art baselines. },
        DOI = { 10.​1007/​978-3-319-31753-3_​13 },
        FILE = { :li_gupta_rana_luo_venkatesh_ashley_phung_pakdd16toxicity - Toxicity Prediction in Cancer Using Multiple Instance Learning in a Multi Task Framework.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.04.06 },
        URL = { http://link.springer.com/chapter/10.1007/978-3-319-31753-3_13 },
    }
C
  • Modelling Human Preferences for Ranking and Collaborative Filtering: A Probabilistic Ordered Partition Approach
    Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. Knowledge and Information Systems (KAIS), 47(1):157-188, April 2016. [ | | pdf]
    @ARTICLE { tran_phung_venkatesh_kais16,
        AUTHOR = { Tran, Truyen and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { Modelling Human Preferences for Ranking and Collaborative Filtering: A Probabilistic Ordered Partition Approach },
        JOURNAL = { Knowledge and Information Systems (KAIS) },
        YEAR = { 2016 },
        VOLUME = { 47 },
        NUMBER = { 1 },
        PAGES = { 157--188 },
        MONTH = { April },
        DOI = { 10.1007/s10115-015-0840-9 },
        FILE = { :tran_phung_venkatesh_kais16 - Modelling Human Preferences for Ranking and Collaborative Filtering_ a Probabilistic Ordered Partition Approach.pdf:PDF },
        KEYWORDS = { Preference learning Learning-to-rank Collaborative filtering Probabilistic ordered partition model Set-based ranking Probabilistic reasoning },
        OWNER = { Dinh },
        TIMESTAMP = { 2015.03.02 },
        URL = { http://link.springer.com/article/10.1007%2Fs10115-015-0840-9 },
    }
J
  • Consistency of the Health of the Nation Outcome Scales (HoNOS) at inpatient-to-community transition
    Luo, Wei, Harvey, Richard, Tran, Truyen, Phung, Dinh, Venkatesh, Svetha and Connor, Jason P. BMJ open, 6(4):e010732, April 2016. [ | | pdf]
    Objectives The Health of the Nation Outcome Scales (HoNOS) are mandated outcome-measures in many mental-health jurisdictions. When HoNOS are used in different care settings, it is important to assess if setting specific bias exists. This article examines the consistency of HoNOS in a sample of psychiatric patients transitioned from acute inpatient care and community centres.Setting A regional mental health service with both acute and community facilities.Participants 111 psychiatric patients were transferred from inpatient care to community care from 2012 to 2014. Their HoNOS scores were extracted from a clinical database; Each inpatient-discharge assessment was followed by a community-intake assessment, with the median period between assessments being 4 days (range 0–14). Assessor experience and professional background were recorded.Primary and secondary outcome measures The difference of HoNOS at inpatient-discharge and community-intake were assessed with Pearson correlation, Cohen's κ and effect size.Results Inpatient-discharge HoNOS was on average lower than community-intake HoNOS. The average HoNOS was 8.05 at discharge (median 7, range 1–22), and 12.16 at intake (median 12, range 1–25), an average increase of 4.11 (SD 6.97). Pearson correlation between two total scores was 0.073 (95% CI −0.095 to 0.238) and Cohen's κ was 0.02 (95% CI −0.02 to 0.06). Differences did not appear to depend on assessor experience or professional background.Conclusions Systematic change in the HoNOS occurs at inpatient-to-community transition. Some caution should be exercised in making direct comparisons between inpatient HoNOS and community HoNOS scores.
    @ARTICLE { luo_harvey_tran_phung_venkatesh_connor_bmj16consistency,
        AUTHOR = { Luo, Wei and Harvey, Richard and Tran, Truyen and Phung, Dinh and Venkatesh, Svetha and Connor, Jason P },
        TITLE = { Consistency of the Health of the Nation Outcome Scales ({HoNOS}) at inpatient-to-community transition },
        JOURNAL = { BMJ open },
        YEAR = { 2016 },
        VOLUME = { 6 },
        NUMBER = { 4 },
        PAGES = { e010732 },
        MONTH = { April },
        ABSTRACT = { Objectives The Health of the Nation Outcome Scales (HoNOS) are mandated outcome-measures in many mental-health jurisdictions. When HoNOS are used in different care settings, it is important to assess if setting specific bias exists. This article examines the consistency of HoNOS in a sample of psychiatric patients transitioned from acute inpatient care and community centres.Setting A regional mental health service with both acute and community facilities.Participants 111 psychiatric patients were transferred from inpatient care to community care from 2012 to 2014. Their HoNOS scores were extracted from a clinical database; Each inpatient-discharge assessment was followed by a community-intake assessment, with the median period between assessments being 4 days (range 0–14). Assessor experience and professional background were recorded.Primary and secondary outcome measures The difference of HoNOS at inpatient-discharge and community-intake were assessed with Pearson correlation, Cohen's κ and effect size.Results Inpatient-discharge HoNOS was on average lower than community-intake HoNOS. The average HoNOS was 8.05 at discharge (median 7, range 1–22), and 12.16 at intake (median 12, range 1–25), an average increase of 4.11 (SD 6.97). Pearson correlation between two total scores was 0.073 (95% CI −0.095 to 0.238) and Cohen's κ was 0.02 (95% CI −0.02 to 0.06). Differences did not appear to depend on assessor experience or professional background.Conclusions Systematic change in the HoNOS occurs at inpatient-to-community transition. Some caution should be exercised in making direct comparisons between inpatient HoNOS and community HoNOS scores. },
        DOI = { 10.1136/bmjopen-2015-010732 },
        FILE = { :luo_harvey_tran_phung_venkatesh_connor_bmj16consistency - Consistency of the Health of the Nation Outcome Scales (HoNOS) at Inpatient to Community Transition.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        PUBLISHER = { British Medical Journal Publishing Group },
        TIMESTAMP = { 2016.05.10 },
        URL = { http://bmjopen.bmj.com/content/6/4/e010732.full },
    }
J
  • A Framework for Classifying Online Mental Health Related Communities with an Interest in Depression
    Saha, Budhaditya, Nguyen, Thin, Phung, Dinh and Venkatesh, Svetha. IEEE Journal of Biomedical and Health Informatics (JBHI), PP(99):1-1, March 2016. [ | | pdf]
    Mental illness has a deep impact on individuals, families, and by extension, society as a whole. Social networks allow individuals with mental disorders to communicate with others sufferers via online communities, providing an invaluable resource for studies on textual signs of psychological health problems. Mental disorders often occur in combinations, e.g., a patient with an anxiety disorder may also develop depression. This co-occurring mental health condition provides the focus for our work on classifying online communities with an interest in depression. For this, we have crawled a large body of 620,000 posts made by 80,000 users in 247 online communities. We have extracted the topics and psycho-linguistic features expressed in the posts, using these as inputs to our model. Following a machine learning technique, we have formulated a joint modelling framework in order to classify mental health-related co-occurring online communities from these features. Finally, we performed empirical validation of the model on the crawled dataset where our model outperforms recent state-of-the-art baselines.
    @ARTICLE { budhaditya_nguyen_phung_venkatesh_bhi16framework,
        AUTHOR = { Saha, Budhaditya and Nguyen, Thin and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { A Framework for Classifying Online Mental Health Related Communities with an Interest in Depression },
        JOURNAL = { IEEE Journal of Biomedical and Health Informatics (JBHI) },
        YEAR = { 2016 },
        VOLUME = { PP },
        NUMBER = { 99 },
        PAGES = { 1-1 },
        MONTH = { March },
        ISSN = { 2168-2194 },
        ABSTRACT = { Mental illness has a deep impact on individuals, families, and by extension, society as a whole. Social networks allow individuals with mental disorders to communicate with others sufferers via online communities, providing an invaluable resource for studies on textual signs of psychological health problems. Mental disorders often occur in combinations, e.g., a patient with an anxiety disorder may also develop depression. This co-occurring mental health condition provides the focus for our work on classifying online communities with an interest in depression. For this, we have crawled a large body of 620,000 posts made by 80,000 users in 247 online communities. We have extracted the topics and psycho-linguistic features expressed in the posts, using these as inputs to our model. Following a machine learning technique, we have formulated a joint modelling framework in order to classify mental health-related co-occurring online communities from these features. Finally, we performed empirical validation of the model on the crawled dataset where our model outperforms recent state-of-the-art baselines. },
        DOI = { 10.1109/JBHI.2016.2543741 },
        FILE = { :budhaditya_nguyen_phung_venkatesh_bhi16framework - A Framework for Classifying Online Mental Health Related Communities with an Interest in Depression.pdf:PDF },
        KEYWORDS = { Blogs;Correlation;Covariance matrices;Feature extraction;Informatics;Media;Pragmatics },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.04.06 },
        URL = { http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=7436759&tag=1 },
    }
J
  • A new transfer learning framework with application to model-agnostic multi-task learning
    Gupta, Sunil, Rana, Santu, Saha, Budhaditya, Phung, Dinh and Venkatesh, Svetha. Knowledge and Information Systems (KAIS), February 2016. [ | | pdf]
    Learning from small number of examples is a challenging problem in machine learning. An effective way to improve the performance is through exploiting knowledge from other related tasks. Multi-task learning (MTL) is one such useful paradigm that aims to improve the performance through jointly modeling multiple related tasks. Although there exist numerous classification or regression models in machine learning literature, most of the MTL models are built around ridge or logistic regression. There exist some limited works, which propose multi-task extension of techniques such as support vector machine, Gaussian processes. However, all these MTL models are tied to specific classification or regression algorithms and there is no single MTL algorithm that can be used at a meta level for any given learning algorithm. Addressing this problem, we propose a generic, model-agnostic joint modeling framework that can take any classification or regression algorithm of a practitioner's choice (standard or custom-built) and build its MTL variant. The key observation that drives our framework is that due to small number of examples, the estimates of task parameters are usually poor, and we show that this leads to an under-estimation of task relatedness between any two tasks with high probability. We derive an algorithm that brings the tasks closer to their true relatedness by improving the estimates of task parameters. This is achieved by appropriate sharing of data across tasks. We provide the detail theoretical underpinning of the algorithm. Through our experiments with both synthetic and real datasets, we demonstrate that the multi-task variants of several classifiers/regressors (logistic regression, support vector machine, K-nearest neighbor, Random Forest, ridge regression, support vector regression) convincingly outperform their single-task counterparts. We also show that the proposed model performs comparable or better than many state-of-the-art MTL and transfer learning baselines.
    @ARTICLE { gupta_rana_budhaditya_phung_venkatesh_kais16newtransfer,
        AUTHOR = { Gupta, Sunil and Rana, Santu and Saha, Budhaditya and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { A new transfer learning framework with application to model-agnostic multi-task learning },
        JOURNAL = { Knowledge and Information Systems (KAIS) },
        YEAR = { 2016 },
        PAGES = { 1--41 },
        MONTH = { February },
        ISSN = { 0219-3116 },
        ABSTRACT = { Learning from small number of examples is a challenging problem in machine learning. An effective way to improve the performance is through exploiting knowledge from other related tasks. Multi-task learning (MTL) is one such useful paradigm that aims to improve the performance through jointly modeling multiple related tasks. Although there exist numerous classification or regression models in machine learning literature, most of the MTL models are built around ridge or logistic regression. There exist some limited works, which propose multi-task extension of techniques such as support vector machine, Gaussian processes. However, all these MTL models are tied to specific classification or regression algorithms and there is no single MTL algorithm that can be used at a meta level for any given learning algorithm. Addressing this problem, we propose a generic, model-agnostic joint modeling framework that can take any classification or regression algorithm of a practitioner's choice (standard or custom-built) and build its MTL variant. The key observation that drives our framework is that due to small number of examples, the estimates of task parameters are usually poor, and we show that this leads to an under-estimation of task relatedness between any two tasks with high probability. We derive an algorithm that brings the tasks closer to their true relatedness by improving the estimates of task parameters. This is achieved by appropriate sharing of data across tasks. We provide the detail theoretical underpinning of the algorithm. Through our experiments with both synthetic and real datasets, we demonstrate that the multi-task variants of several classifiers/regressors (logistic regression, support vector machine, K-nearest neighbor, Random Forest, ridge regression, support vector regression) convincingly outperform their single-task counterparts. We also show that the proposed model performs comparable or better than many state-of-the-art MTL and transfer learning baselines. },
        DOI = { 10.1007/s10115-016-0926-z },
        FILE = { :gupta_rana_budhaditya_phung_venkatesh_kais16newtransfer - A New Transfer Learning Framework with Application to Model Agnostic Multi Task Learning.pdf:PDF },
        KEYWORDS = { Multi-task learning Model-agnostic framework Meta algorithm Classification Regression },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.05.10 },
        URL = { http://dx.doi.org/10.1007/s10115-016-0926-z },
    }
J
  • Multiple Task Transfer Learning with Small Sample Sizes
    Saha, Budhaditya, Gupta, Sunil, Phung, Dinh and Venkatesh, Svetha. Knowledge and Information System (KAIS), 46(2):315-342, Feb. 2016. [ | | pdf]
    Prognosis, such as predicting mortality, is common in medicine. Whenconfronted with small numbers of samples, as in rare medical conditions,the task is challenging. We propose a framework for classificationwith data with small numbers of samples. Conceptually our solutionis a hybrid of multi-task and transfer learning, employing data samplesfrom source tasks as in transfer learning, but considering all taskstogether as in multi-tasklearning. Each task is modelled jointly with other related tasks bydirectly augmenting the data from other tasks. The degree of augmentationdepends on the task relatedness and is estimated directly from thedata. We apply the model on three diverse real-world datasets (healthcaredata, handwritten digit data and face data) and show that our methodoutperforms several state-of-the-art multi-task learning baselines.We extend the model for online multi-task learning where the modelparameters are incrementally updated given new data or new tasks.The novelty of our method lies in offering a hybrid multi-task/transferlearning model to exploit sharing across tasks at the data-leveland joint parameter learning.
    @ARTICLE { budhaditya_gupta_venkatesh_phung_kais16multiple,
        AUTHOR = { Saha, Budhaditya and Gupta, Sunil and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { Multiple Task Transfer Learning with Small Sample Sizes },
        JOURNAL = { Knowledge and Information System (KAIS) },
        YEAR = { 2016 },
        VOLUME = { 46 },
        NUMBER = { 2 },
        PAGES = { 315--342 },
        MONTH = { Feb. },
        ABSTRACT = { Prognosis, such as predicting mortality, is common in medicine. Whenconfronted with small numbers of samples, as in rare medical conditions,the task is challenging. We propose a framework for classificationwith data with small numbers of samples. Conceptually our solutionis a hybrid of multi-task and transfer learning, employing data samplesfrom source tasks as in transfer learning, but considering all taskstogether as in multi-tasklearning. Each task is modelled jointly with other related tasks bydirectly augmenting the data from other tasks. The degree of augmentationdepends on the task relatedness and is estimated directly from thedata. We apply the model on three diverse real-world datasets (healthcaredata, handwritten digit data and face data) and show that our methodoutperforms several state-of-the-art multi-task learning baselines.We extend the model for online multi-task learning where the modelparameters are incrementally updated given new data or new tasks.The novelty of our method lies in offering a hybrid multi-task/transferlearning model to exploit sharing across tasks at the data-leveland joint parameter learning. },
        DOI = { 10.1007/s10115-015-0821-z },
        FILE = { :budhaditya_gupta_venkatesh_phung_kais16multiple - Multiple Task Transfer Learning with Small Sample Sizes.pdf:PDF },
        KEYWORDS = { Multi-task Transfer learning Optimization Healthcare Data mining Statistical analysis },
        OWNER = { dinh },
        TIMESTAMP = { 2015.06.10 },
        URL = { http://link.springer.com/article/10.1007/s10115-015-0821-z },
    }
J
  • Stabilizing L1-norm Prediction Models by Supervised Feature Grouping
    Kamkar, Iman, Gupta, Sunil Kumar, Phung, Dinh and Venkatesh, Svetha. Journal of Biomedical Informatics (JBI), 59(C):149 -168, Feb. 2016. [ | | pdf]
    Emerging Electronic Medical Records (EMRs) have reformed the modern healthcare. These records have great potential to be used for building clinical prediction models. However, a problem in using them is their high dimensionality. Since a lot of information may not be relevant for prediction, the underlying complexity of the prediction models may not be high. A popular way to deal with this problem is to employ feature selection. Lasso and l 1 -norm based feature selection methods have shown promising results. But, in presence of correlated features, these methods select features that change considerably with small changes in data. This prevents clinicians to obtain a stable feature set, which is crucial for clinical decision making. Grouping correlated variables together can improve the stability of feature selection, however, such grouping is usually not known and needs to be estimated for optimal performance. Addressing this problem, we propose a new model that can simultaneously learn the grouping of correlated features and perform stable feature selection. We formulate the model as a constrained optimization problem and provide an efficient solution with guaranteed convergence. Our experiments with both synthetic and real-world datasets show that the proposed model is significantly more stable than Lasso and many existing state-of-the-art shrinkage and classification methods. We further show that in terms of prediction performance, the proposed method consistently outperforms Lasso and other baselines. Our model can be used for selecting stable risk factors for a variety of healthcare problems, so it can assist clinicians toward accurate decision making.
    @ARTICLE { kamkar_gupta_phung_venkatesh_16stabilizing,
        AUTHOR = { Kamkar, Iman and Gupta, Sunil Kumar and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { Stabilizing L1-norm Prediction Models by Supervised Feature Grouping },
        JOURNAL = { Journal of Biomedical Informatics (JBI) },
        YEAR = { 2016 },
        VOLUME = { 59 },
        NUMBER = { C },
        PAGES = { 149 --168 },
        MONTH = { Feb. },
        ISSN = { 1532-0464 },
        ABSTRACT = { Emerging Electronic Medical Records (EMRs) have reformed the modern healthcare. These records have great potential to be used for building clinical prediction models. However, a problem in using them is their high dimensionality. Since a lot of information may not be relevant for prediction, the underlying complexity of the prediction models may not be high. A popular way to deal with this problem is to employ feature selection. Lasso and l 1 -norm based feature selection methods have shown promising results. But, in presence of correlated features, these methods select features that change considerably with small changes in data. This prevents clinicians to obtain a stable feature set, which is crucial for clinical decision making. Grouping correlated variables together can improve the stability of feature selection, however, such grouping is usually not known and needs to be estimated for optimal performance. Addressing this problem, we propose a new model that can simultaneously learn the grouping of correlated features and perform stable feature selection. We formulate the model as a constrained optimization problem and provide an efficient solution with guaranteed convergence. Our experiments with both synthetic and real-world datasets show that the proposed model is significantly more stable than Lasso and many existing state-of-the-art shrinkage and classification methods. We further show that in terms of prediction performance, the proposed method consistently outperforms Lasso and other baselines. Our model can be used for selecting stable risk factors for a variety of healthcare problems, so it can assist clinicians toward accurate decision making. },
        DOI = { http://dx.doi.org/10.1016/j.jbi.2015.11.012 },
        FILE = { :kamkar_gupta_phung_venkatesh_16stabilizing - Stabilizing L1 Norm Prediction Models by Supervised Feature Grouping.pdf:PDF },
        KEYWORDS = { Feature selection, Lasso, Stability, Supervised feature grouping },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.04.06 },
        URL = { http://www.sciencedirect.com/science/article/pii/S1532046415002804 },
    }
J
  • Graph-induced restricted Boltzmann machines for document modeling
    Nguyen, Tu Dinh, Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. Information Sciences, 328(C):60-75, Jan. 2016. [ | | pdf]
    Discovering knowledge from unstructured texts is a central theme in data mining and machine learning. We focus on fast discovery of thematic structures from a corpus. Our approach is based on a versatile probabilistic formulation – the restricted Boltzmann machine (RBM) – where the underlying graphical model is an undirected bipartite graph. Inference is efficient – document representation can be computed with a single matrix projection, making RBMs suitable for massive text corpora available today. Standard RBMs, however, operate on bag-of-words assumption, ignoring the inherent underlying relational structures among words. This results in less coherent word thematic grouping. We introduce graph-based regularization schemes that exploit the linguistic structures, which in turn can be constructed from either corpus statistics or domain knowledge. We demonstrate that the proposed technique improves the group coherence, facilitates visualization, provides means for estimation of intrinsic dimensionality, reduces overfitting, and possibly leads to better classification accuracy.
    @ARTICLE { nguyen_tran_phung_venkatesh_jis16graph,
        AUTHOR = { Nguyen, Tu Dinh and Tran, Truyen and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { Graph-induced restricted {B}oltzmann machines for document modeling },
        JOURNAL = { Information Sciences },
        YEAR = { 2016 },
        VOLUME = { 328 },
        NUMBER = { C },
        PAGES = { 60--75 },
        MONTH = { Jan. },
        ABSTRACT = { Discovering knowledge from unstructured texts is a central theme in data mining and machine learning. We focus on fast discovery of thematic structures from a corpus. Our approach is based on a versatile probabilistic formulation – the restricted Boltzmann machine (RBM) – where the underlying graphical model is an undirected bipartite graph. Inference is efficient – document representation can be computed with a single matrix projection, making RBMs suitable for massive text corpora available today. Standard RBMs, however, operate on bag-of-words assumption, ignoring the inherent underlying relational structures among words. This results in less coherent word thematic grouping. We introduce graph-based regularization schemes that exploit the linguistic structures, which in turn can be constructed from either corpus statistics or domain knowledge. We demonstrate that the proposed technique improves the group coherence, facilitates visualization, provides means for estimation of intrinsic dimensionality, reduces overfitting, and possibly leads to better classification accuracy. },
        DOI = { doi:10.1016/j.ins.2015.08.023 },
        FILE = { :nguyen_tran_phung_venkatesh_jis16graph - Graph Induced Restricted Boltzmann Machines for Document Modeling.pdf:PDF },
        KEYWORDS = { Document modeling, Feature group discovery, Restricted Boltzmann machine, Topic coherence, Word graphs },
        OWNER = { dinh },
        PUBLISHER = { Elsevier },
        TIMESTAMP = { 2015.09.16 },
        URL = { http://dx.doi.org/10.1016/j.ins.2015.08.023 },
    }
J
2015
  • Differentiating sub-groups of online depression-related communities using textual cues
    Nguyen, Thin, O'Dea, Bridianne, Larsen, Mark, Phung, Dinh, Venkatesh, Svetha and Christensen, Helen. In Intl. Conf. on Web Information Systems Engineering (WISE), pages 216-224, Dec. 2015. [ | | pdf]
    Depression is a highly prevalent mental illness and is a comorbidity of other mental and behavioural disorders. The Internet allows individuals who are depressed or caring for those who are depressed, to connect with others via online communities; however, the characteristics of these online conversations and the language styles of those interested in depression have not yet been fully explored. This work aims to explore the textual cues of online communities interested in depression. A random sample of 5,000 blog posts was crawled. Five groupings were identified: depression, bipolar, self-harm, grief, and suicide. Independent variables included psycholinguistic processes and content topics extracted from the posts. Machine learning techniques were used to discriminate messages posted in the depression sub-group from the others. Good predictive validity in depression classification using topics and psycholinguistic clues as features was found. Clear discrimination between writing styles and content, with good predictive power is an important step in understanding social media and its use in mental health.
    @INPROCEEDINGS { nguyen_odea_larsen_phung_venkatesh_christensen_wise15differentiating,
        AUTHOR = { Nguyen, Thin and O'Dea, Bridianne and Larsen, Mark and Phung, Dinh and Venkatesh, Svetha and Christensen, Helen },
        TITLE = { Differentiating sub-groups of online depression-related communities using textual cues },
        BOOKTITLE = { Intl. Conf. on Web Information Systems Engineering (WISE) },
        YEAR = { 2015 },
        VOLUME = { 9419 },
        SERIES = { Lecture Notes in Computer Science },
        PAGES = { 216--224 },
        MONTH = { Dec. },
        PUBLISHER = { Springer },
        ABSTRACT = { Depression is a highly prevalent mental illness and is a comorbidity of other mental and behavioural disorders. The Internet allows individuals who are depressed or caring for those who are depressed, to connect with others via online communities; however, the characteristics of these online conversations and the language styles of those interested in depression have not yet been fully explored. This work aims to explore the textual cues of online communities interested in depression. A random sample of 5,000 blog posts was crawled. Five groupings were identified: depression, bipolar, self-harm, grief, and suicide. Independent variables included psycholinguistic processes and content topics extracted from the posts. Machine learning techniques were used to discriminate messages posted in the depression sub-group from the others. Good predictive validity in depression classification using topics and psycholinguistic clues as features was found. Clear discrimination between writing styles and content, with good predictive power is an important step in understanding social media and its use in mental health. },
        DOI = { 10.1007/978-3-319-26187-4_17 },
        FILE = { :nguyen_odea_larsen_phung_venkatesh_christensen_wise15differentiating - Differentiating Sub Groups of Online Depression Related Communities Using Textual Cues.pdf:PDF },
        ISBN = { 978-3-319-11748-5 },
        KEYWORDS = { Web community; Feature extraction; Textual cues; Online depression },
        LANGUAGE = { English },
        OWNER = { thinng },
        TIMESTAMP = { 2015.09.16 },
        URL = { http://link.springer.com/chapter/10.1007/978-3-319-26187-4_17 },
    }
C
  • Using Twitter to learn about the autism community
    Beykikhoshk, Adham, Arandjelovi{\'c}, Ognjen, Phung, Dinh, Venkatesh, Svetha and Caelli, Terry. IEEE/ACM Intl. Conf. on Advances in Social Network Analysis and Mining (ASONAM), 5(1):1-17, December 2015. [ | | pdf]
    Considering the raising socio-economic burden of autism spectrum disorder (ASD), timely and evidence-driven public policy decision-making and communication of the latest guidelines pertaining to the treatment and management of the disorder is crucial. Yet evidence suggests that policy makers and medical practitioners do not always have a good understanding of the practices and relevant beliefs of ASD-afflicted individuals' carers who often follow questionable recommendations and adopt advice poorly supported by scientific data. The key goal of the present work is to explore the idea that Twitter, as a highly popular platform for information exchange, could be used as a data-mining source to learn about the population affected by ASD---their behaviour, concerns, needs, etc. To this end, using a large data set of over 11 million harvested tweets as the basis for our investigation, we describe a series of experiments which examine a range of linguistic and semantic aspects of messages posted by individuals interested in ASD. Our findings, the first of their nature in the published scientific literature, strongly motivate additional research on this topic and present a methodological basis for further work.
    @ARTICLE { beykikhoshk_arandjelovic_phung_venkatesh_caelli_snaam15using,
        AUTHOR = { Beykikhoshk, Adham and Arandjelovi{\'c}, Ognjen and Phung, Dinh and Venkatesh, Svetha and Caelli, Terry },
        TITLE = { Using {T}witter to learn about the autism community },
        JOURNAL = { IEEE/ACM Intl. Conf. on Advances in Social Network Analysis and Mining (ASONAM) },
        YEAR = { 2015 },
        VOLUME = { 5 },
        NUMBER = { 1 },
        PAGES = { 1--17 },
        MONTH = { December },
        ABSTRACT = { Considering the raising socio-economic burden of autism spectrum disorder (ASD), timely and evidence-driven public policy decision-making and communication of the latest guidelines pertaining to the treatment and management of the disorder is crucial. Yet evidence suggests that policy makers and medical practitioners do not always have a good understanding of the practices and relevant beliefs of ASD-afflicted individuals' carers who often follow questionable recommendations and adopt advice poorly supported by scientific data. The key goal of the present work is to explore the idea that Twitter, as a highly popular platform for information exchange, could be used as a data-mining source to learn about the population affected by ASD---their behaviour, concerns, needs, etc. To this end, using a large data set of over 11 million harvested tweets as the basis for our investigation, we describe a series of experiments which examine a range of linguistic and semantic aspects of messages posted by individuals interested in ASD. Our findings, the first of their nature in the published scientific literature, strongly motivate additional research on this topic and present a methodological basis for further work. },
        DOI = { 10.1007/s13278-015-0261-5 },
        FILE = { :beykikhoshk_arandjelovic_phung_venkatesh_caelli_snaam15using - Using Twitter to Learn about the Autism Community.pdf:PDF },
        KEYWORDS = { Social media Big data Asperger’s Mental health Health care Public health ASD },
        OWNER = { dinh },
        PUBLISHER = { Springer Vienna },
        TIMESTAMP = { 2015.06.10 },
        URL = { http://dx.doi.org/10.1007/s13278-015-0261-5 },
    }
J
  • Learning Entry Profiles of Children with Autism from Multivariate Treatment Information Using Restricted Boltzmann Machines
    Vellanki, Pratibha, Phung, Dinh, Duong, Thi and Venkatesh, Svetha. In Trends and Applications in Knowledge Discovery and Data Mining, pages 245-257, Cham, Nov. 2015. [ | | pdf]
    @INPROCEEDINGS { vellanki_phung_duong_venkatesh_pakdd2015learning,
        AUTHOR = { Vellanki, Pratibha and Phung, Dinh and Duong, Thi and Venkatesh, Svetha },
        TITLE = { Learning Entry Profiles of Children with Autism from Multivariate Treatment Information Using Restricted {B}oltzmann Machines },
        BOOKTITLE = { Trends and Applications in Knowledge Discovery and Data Mining },
        YEAR = { 2015 },
        VOLUME = { 9441 },
        SERIES = { Lecture Notes in Computer Science },
        PAGES = { 245--257 },
        ADDRESS = { Cham },
        MONTH = { Nov. },
        PUBLISHER = { Springer },
        DOI = { 10.1007/978-3-319-25660-3_21 },
        FILE = { :vellanki_phung_duong_venkatesh_pakdd2015learning - Learning Entry Profiles of Children with Autism from Multivariate Treatment Information Using Restricted Boltzmann Machines.pdf:PDF },
        ISBN = { 978-3-319-25660-3 },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.05.21 },
        URL = { http://dx.doi.org/10.1007/978-3-319-25660-3_21 },
    }
C
  • Multi-View Subspace Clustering for Face Images
    Zhang, Xin, Phung, Dinh, Venkatesh, Svetha, Pham, Duc-Son and Liu, Wanquan. In Intl. Conf. on Digital Image Computing: Techniques and Applications (DICTA), pages 1-7, Nov. 2015. [ | | pdf]
    In many real-world computer vision applications, such as multi-camera surveillance, the objects of interest are captured by visual sensors concurrently, resulting in multi-view data. These views usually provide complementary information to each other. One recent and powerful computer vision method for clustering is sparse subspace clustering (SSC); however, it was not designed for multi-view data, which break down its linear separability assumption. To integrate complementary information between views, multi-view clustering algorithms are required to improve the clustering performance. In this paper, we propose a novel multi-view subspace clustering by searching for an unified latent structure as a global affinity matrix in subspace clustering. Due to the integration of affinity matrices for each view, this global affinity matrix can best represent the relationship between clusters. This could help us achieve better performance on face clustering. We derive a provably convergent algorithm based on the alternating direction method of multipliers (ADMM) framework, which is computationally efficient, to solve the formulation. We demonstrate that this formulation outperforms other alternatives based on state-of-the-arts on challenging multi-view face datasets.
    @INPROCEEDINGS { zhang_phung_venkatesh_pham_liu_dicta15multiview,
        AUTHOR = { Zhang, Xin and Phung, Dinh and Venkatesh, Svetha and Pham, Duc-Son and Liu, Wanquan },
        TITLE = { Multi-View Subspace Clustering for Face Images },
        BOOKTITLE = { Intl. Conf. on Digital Image Computing: Techniques and Applications (DICTA) },
        YEAR = { 2015 },
        PAGES = { 1-7 },
        MONTH = { Nov. },
        ABSTRACT = { In many real-world computer vision applications, such as multi-camera surveillance, the objects of interest are captured by visual sensors concurrently, resulting in multi-view data. These views usually provide complementary information to each other. One recent and powerful computer vision method for clustering is sparse subspace clustering (SSC); however, it was not designed for multi-view data, which break down its linear separability assumption. To integrate complementary information between views, multi-view clustering algorithms are required to improve the clustering performance. In this paper, we propose a novel multi-view subspace clustering by searching for an unified latent structure as a global affinity matrix in subspace clustering. Due to the integration of affinity matrices for each view, this global affinity matrix can best represent the relationship between clusters. This could help us achieve better performance on face clustering. We derive a provably convergent algorithm based on the alternating direction method of multipliers (ADMM) framework, which is computationally efficient, to solve the formulation. We demonstrate that this formulation outperforms other alternatives based on state-of-the-arts on challenging multi-view face datasets. },
        DOI = { 10.1109/DICTA.2015.7371289 },
        FILE = { :zhang_phung_venkatesh_pham_liu_dicta15multiview - Multi View Subspace Clustering for Face Images.pdf:PDF },
        KEYWORDS = { computer vision;face recognition;pattern clustering;ADMM framework;SSC;affinity matrices;alternating direction method;computer vision applications;computer vision method;convergent algorithm;face clustering;face images;global affinity matrix;latent structure;linear separability assumption;multicamera surveillance;multipliers;multiview data;multiview face datasets;multiview subspace clustering algorithms;sparse subspace clustering performance;visual sensors;Cameras;Clustering algorithms;Computer vision;Face;Loss measurement;Matrix decomposition;Sparse matrices },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.05.21 },
        URL = { http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7371289 },
    }
C
  • Streaming Variational Inference for Dirichlet Process Mixtures
    Huynh, V., Phung, D. and Venkatesh, S.. In 7th Asian Conference on Machine Learning (ACML), pages 237-252, Nov. 2015. [ | | pdf]
    Bayesian nonparametric models are theoretically suitable to learn streaming data due to their complexity relaxation to the volume of observed data. However, most of the existing variational inference algorithms are not applicable to streaming applications since they re-quire truncation on variational distributions. In this paper, we present two truncation-free variational algorithms, one for mix-membership inference called TFVB (truncation-free variational Bayes), and the other for hard clustering inference called TFME (truncation-free maximization expectation). With these algorithms, we further developed a streaming learning framework for the popular Dirichlet process mixture (DPM) models. Our ex-periments demonstrate the usefulness of our framework in both synthetic and real-world data.
    @INPROCEEDINGS { huynh_phung_venkatesh_15streaming,
        AUTHOR = { Huynh, V. and Phung, D. and Venkatesh, S. },
        TITLE = { Streaming Variational Inference for {D}irichlet {P}rocess {M}ixtures },
        BOOKTITLE = { 7th Asian Conference on Machine Learning (ACML) },
        YEAR = { 2015 },
        PAGES = { 237--252 },
        MONTH = { Nov. },
        ABSTRACT = { Bayesian nonparametric models are theoretically suitable to learn streaming data due to their complexity relaxation to the volume of observed data. However, most of the existing variational inference algorithms are not applicable to streaming applications since they re-quire truncation on variational distributions. In this paper, we present two truncation-free variational algorithms, one for mix-membership inference called TFVB (truncation-free variational Bayes), and the other for hard clustering inference called TFME (truncation-free maximization expectation). With these algorithms, we further developed a streaming learning framework for the popular Dirichlet process mixture (DPM) models. Our ex-periments demonstrate the usefulness of our framework in both synthetic and real-world data. },
        FILE = { :huynh_phung_venkatesh_15streaming - Streaming Variational Inference for Dirichlet Process Mixtures.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.04.06 },
        URL = { http://www.jmlr.org/proceedings/papers/v45/Huynh15.pdf },
    }
C
  • Understanding toxicities and complications of cancer treatment: A data mining approach
    Nguyen, Dang, Luo, Wei, Phung, Dinh and Venkatesh, Svetha. In 28th Australasian Joint Conference on Artificial Intelligence (AI), pages 431-443, Nov 2015. [ | | pdf]
    @INPROCEEDINGS { nguyen_luo_phung_venkatesh_ai15understanding,
        AUTHOR = { Nguyen, Dang and Luo, Wei and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { Understanding toxicities and complications of cancer treatment: A data mining approach },
        BOOKTITLE = { 28th Australasian Joint Conference on Artificial Intelligence (AI) },
        YEAR = { 2015 },
        EDITOR = { Pfahringer, Bernhard and Renz, Jochen },
        VOLUME = { 9457 },
        SERIES = { Lecture Notes in Computer Science },
        PAGES = { 431--443 },
        MONTH = { Nov },
        PUBLISHER = { Springer International Publishing },
        DOI = { 10.1007/978-3-319-26350-2_38 },
        FILE = { :nguyen_luo_phung_venkatesh_ai15understanding - Understanding Toxicities and Complications of Cancer Treatment_ a Data Mining Approach.pdf:PDF },
        LOCATION = { Canberra, ACT, Australia },
        OWNER = { ngdang },
        TIMESTAMP = { 2015.09.15 },
        URL = { http://dx.doi.org/10.1007/978-3-319-26350-2_38 },
    }
C
  • Stable Feature Selection with Support Vector Machines
    Kamkar, Iman, Gupta, Sunil Kumar, Phung, Dinh and Venkatesh, Svetha. In 28th Australasian Joint Conference on Artificial Intelligence (AI), pages 298-308, Cham, Nov. 2015. [ | | pdf]
    The support vector machine (SVM) is a popular method for classification, well known for finding the maximum-margin hyperplane. Combining SVM with l1l1-norm penalty further enables it to simultaneously perform feature selection and margin maximization within a single framework. However, l1l1-norm SVM shows instability in selecting features in presence of correlated features. We propose a new method to increase the stability of l1l1-norm SVM by encouraging similarities between feature weights based on feature correlations, which is captured via a feature covariance matrix. Our proposed method can capture both positive and negative correlations between features. We formulate the model as a convex optimization problem and propose a solution based on alternating minimization. Using both synthetic and real-world datasets, we show that our model achieves better stability and classification accuracy compared to several state-of-the-art regularized classification methods.
    @INPROCEEDINGS { kamkar_gupta_phung_venkatesh_ai15stable,
        AUTHOR = { Kamkar, Iman and Gupta, Sunil Kumar and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { Stable Feature Selection with {S}upport {V}ector {M}achines },
        BOOKTITLE = { 28th Australasian Joint Conference on Artificial Intelligence (AI) },
        YEAR = { 2015 },
        EDITOR = { Pfahringer, Bernhard and Renz, Jochen },
        VOLUME = { 9457 },
        SERIES = { Lecture Notes in Computer Science },
        PAGES = { 298--308 },
        ADDRESS = { Cham },
        MONTH = { Nov. },
        PUBLISHER = { Springer International Publishing },
        ABSTRACT = { The support vector machine (SVM) is a popular method for classification, well known for finding the maximum-margin hyperplane. Combining SVM with l1l1-norm penalty further enables it to simultaneously perform feature selection and margin maximization within a single framework. However, l1l1-norm SVM shows instability in selecting features in presence of correlated features. We propose a new method to increase the stability of l1l1-norm SVM by encouraging similarities between feature weights based on feature correlations, which is captured via a feature covariance matrix. Our proposed method can capture both positive and negative correlations between features. We formulate the model as a convex optimization problem and propose a solution based on alternating minimization. Using both synthetic and real-world datasets, we show that our model achieves better stability and classification accuracy compared to several state-of-the-art regularized classification methods. },
        DOI = { 10.1007/978-3-319-26350-2_26 },
        FILE = { :kamkar_gupta_phung_venkatesh_ai15stable - Stable Feature Selection with Support Vector Machines.pdf:PDF },
        ISBN = { 978-3-319-26350-2 },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.05.21 },
        URL = { http://dx.doi.org/10.1007/978-3-319-26350-2_26 },
    }
C
  • Exploiting Feature Relationships Towards Stable Feature Selection
    Kamkar, Iman, Gupta, Sunil, Phung, Dinh and Venkatesh, Svetha. In Intl. Conf. on Data Science and Advanced Analytics (DSAA), pages 1-10, Paris, France, Oct. 2015. [ | | pdf]
    Feature selection is an important step in building predictive models for most real-world problems. One of the popular methods in feature selection is Lasso. However, it shows instability in selecting features when dealing with correlated features. In this work, we propose a new method that aims to increase the stability of Lasso by encouraging similarities between features based on their relatedness, which is captured via a feature covariance matrix. Besides modeling positive feature correlations, our method can also identify negative correlations between features. We propose a convex formulation for our model along with an alternating optimization algorithm that can learn the weights of the features as well as the relationship between them. Using both synthetic and real-world data, we show that the proposed method is more stable than Lasso and many state-of-the-art shrinkage and feature selection methods. Also, its predictive performance is comparable to other methods.
    @INPROCEEDINGS { kamkar_gupta_phung_venkatesh_dsaa15,
        AUTHOR = { Kamkar, Iman and Gupta, Sunil and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { Exploiting Feature Relationships Towards Stable Feature Selection },
        BOOKTITLE = { Intl. Conf. on Data Science and Advanced Analytics (DSAA) },
        YEAR = { 2015 },
        PAGES = { 1--10 },
        ADDRESS = { Paris, France },
        MONTH = { Oct. },
        ABSTRACT = { Feature selection is an important step in building predictive models for most real-world problems. One of the popular methods in feature selection is Lasso. However, it shows instability in selecting features when dealing with correlated features. In this work, we propose a new method that aims to increase the stability of Lasso by encouraging similarities between features based on their relatedness, which is captured via a feature covariance matrix. Besides modeling positive feature correlations, our method can also identify negative correlations between features. We propose a convex formulation for our model along with an alternating optimization algorithm that can learn the weights of the features as well as the relationship between them. Using both synthetic and real-world data, we show that the proposed method is more stable than Lasso and many state-of-the-art shrinkage and feature selection methods. Also, its predictive performance is comparable to other methods. },
        DOI = { 10.1109/DSAA.2015.7344859 },
        FILE = { :kamkar_gupta_phung_venkatesh_dsaa15 - Exploiting Feature Relationships Towards Stable Feature Selection.pdf:PDF },
        KEYWORDS = { convex programming;covariance matrices;feature selection;Lasso stability;convex formulation;correlated feature;feature covariance matrix;feature relationship;feature selection method;negative correlation;optimization algorithm;positive feature correlation;predictive model;real-world data;shrinkage;stable feature selection;synthetic data;Correlation;Covariance matrices;Linear programming;Optimization;Predictive models;Stability criteria;Correlated features;Lasso;Prediction;Stability },
        OWNER = { ikamkar },
        TIMESTAMP = { 2015.09.16 },
        URL = { http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7344859 },
    }
C
  • Nonparametric Discovery of Online Mental Health-Related Communities
    Dao, Bo, Nguyen, Thin, Venkatesh, Svetha and Phung, Dinh. In Intl. Conf. on Data Science and Advanced Analytics (DSAA), pages 1-10, Paris, France, Oct. 2015. (IEEE CIS Travel Grants Award). [ | | pdf]
    @INPROCEEDINGS { dao_nguyen_venkatesh_phung_dsaa15,
        AUTHOR = { Dao, Bo and Nguyen, Thin and Venkatesh, Svetha and Phung, Dinh },
        TITLE = { Nonparametric Discovery of Online Mental Health-Related Communities },
        BOOKTITLE = { Intl. Conf. on Data Science and Advanced Analytics (DSAA) },
        YEAR = { 2015 },
        PAGES = { 1-10 },
        ADDRESS = { Paris, France },
        MONTH = { Oct. },
        PUBLISHER = { IEEE },
        NOTE = { IEEE CIS Travel Grants Award },
        DOI = { 10.1109/DSAA.2015.7344841 },
        FILE = { :dao_nguyen_venkatesh_phung_dsaa15 - Nonparametric Discovery of Online Mental Health Related Communities.pdf:PDF },
        KEYWORDS = { cognition;health care;nonparametric statistics;pattern clustering;social networking (online);cognitive dynamics;mood swings patterns;nonparametric clustering;nonparametric discovery;nonparametric topic modelling;online communities;online mental health-related communities;social media;Autism;Blogs;Media;Mood;Sentiment analysis;Variable speed drives;Mental Health;Moods and Emotion;Nonparametric Discovery;Online Communities;Social Media;Topics },
        OWNER = { dbdao },
        TIMESTAMP = { 2015.07.23 },
        URL = { http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7344841 },
    }
C
  • Mixed-norm sparse representation for multi view face recognition
    Zhang, Xin, Pham, Duc-Son, Venkatesh, Svetha, Liu, Wanquan and Phung, Dinh. Pattern Recognition, 48(9):2935-2946, Sep. 2015. [ | | pdf]
    @ARTICLE { zhang_pham_venkatesh_liu_phung_pr15mixed,
        AUTHOR = { Zhang, Xin and Pham, Duc-Son and Venkatesh, Svetha and Liu, Wanquan and Phung, Dinh },
        TITLE = { Mixed-norm sparse representation for multi view face recognition },
        JOURNAL = { Pattern Recognition },
        YEAR = { 2015 },
        VOLUME = { 48 },
        NUMBER = { 9 },
        PAGES = { 2935--2946 },
        MONTH = { Sep. },
        DOI = { 10.1016/j.patcog.2015.02.022 },
        FILE = { :zhang_pham_venkatesh_liu_phung_pr15mixed - Mixed Norm Sparse Representation for Multi View Face Recognition.pdf:PDF },
        KEYWORDS = { ADMM, Convex optimization, Group sparse representation, Joint dynamic sparse representation classification, Multi-pose face recognition, Multi-task learning, Robust face recognition, Sparse representation classification, Unsupervised learning },
        OWNER = { dinh },
        PUBLISHER = { Pergamon },
        TIMESTAMP = { 2015.09.16 },
        URL = { http://dl.acm.org/citation.cfm?id=2792197 },
    }
J
  • Overcoming Data Scarcity of Twitter: Using Tweets As Bootstrap with Application to Autism-Related Topic Content Analysis
    Beykikhoshk, Adham, Arandjelovi\'{c}, Ognjen, Phung, Dinh and Venkatesh, Svetha. In IEEE/ACM Intl. Conf. on Advances in Social Networks Analysis and Mining (ASONAM), pages 1354-1361, New York, NY, USA, Aug. 2015. [ | | pdf]
    Notwithstanding recent work which has demonstrated the potential of using Twitter messages for content-specific data mining and analysis, the depth of such analysis is inherently limited by the scarcity of data imposed by the 140 character tweet limit. In this paper we describe a novel approach for targeted knowledge exploration which uses tweet content analysis as a preliminary step. This step is used to bootstrap more sophisticated data collection from directly related but much richer content sources. In particular we demonstrate that valuable information can be collected by following URLs included in tweets. We automatically extract content from the corresponding web pages and treating each web page as a document linked to the original tweet show how a temporal topic model based on a hierarchical Dirichlet process can be used to track the evolution of a complex topic structure of a Twitter community. Using autism-related tweets we demonstrate that our method is capable of capturing a much more meaningful picture of information exchange than user-chosen hashtags.
    @INPROCEEDINGS { beykikhoshk_arandjelovic_phung_venkatesh_asonam15overcoming,
        AUTHOR = { Beykikhoshk, Adham and Arandjelovi\'{c}, Ognjen and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { Overcoming Data Scarcity of {T}witter: Using Tweets As Bootstrap with Application to Autism-Related Topic Content Analysis },
        BOOKTITLE = { IEEE/ACM Intl. Conf. on Advances in Social Networks Analysis and Mining (ASONAM) },
        YEAR = { 2015 },
        SERIES = { ASONAM '15 },
        PAGES = { 1354--1361 },
        ADDRESS = { New York, NY, USA },
        MONTH = { Aug. },
        PUBLISHER = { ACM },
        ABSTRACT = { Notwithstanding recent work which has demonstrated the potential of using Twitter messages for content-specific data mining and analysis, the depth of such analysis is inherently limited by the scarcity of data imposed by the 140 character tweet limit. In this paper we describe a novel approach for targeted knowledge exploration which uses tweet content analysis as a preliminary step. This step is used to bootstrap more sophisticated data collection from directly related but much richer content sources. In particular we demonstrate that valuable information can be collected by following URLs included in tweets. We automatically extract content from the corresponding web pages and treating each web page as a document linked to the original tweet show how a temporal topic model based on a hierarchical Dirichlet process can be used to track the evolution of a complex topic structure of a Twitter community. Using autism-related tweets we demonstrate that our method is capable of capturing a much more meaningful picture of information exchange than user-chosen hashtags. },
        ACMID = { 2808908 },
        DOI = { 10.1145/2808797.2808908 },
        FILE = { :beykikhoshk_arandjelovic_phung_venkatesh_asonam15overcoming - Overcoming Data Scarcity of Twitter_ Using Tweets As Bootstrap with Application to Autism Related Topic Content Analysis.pdf:PDF },
        ISBN = { 978-1-4503-3854-7 },
        LOCATION = { Paris, France },
        NUMPAGES = { 8 },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.05.21 },
        URL = { http://doi.acm.org/10.1145/2808797.2808908 },
    }
C
  • Autism Blogs: Expressed Emotion, Language Styles and Concerns in Personal and Community Settings
    Nguyen, Thin, Duong, Thi, Venkatesh, Svetha and Phung, Dinh. IEEE Transactions on Affective Computing (TAC), 6(3):312-323, July 2015. [ | | pdf]
    The Internet has provided an ever increasingly popular platform for individuals to voice their thoughts, and like-minded people to share stories. This unintentionally leaves characteristics of individuals and communities, which are often difficult to be collected in traditional studies. Individuals with autism are such a case, in which the Internet could facilitate even more communication given its social-spatial distance being a characteristic preference for individuals with autism. Previous studies examined the traces left in the posts of online autism communities (Autism) in comparison with other online communities (Control). This work further investigates these online populations through the contents of not only their posts but also their comments. We first compare the Autism and Control blogs based on three features: topics, language styles and affective information. The autism groups are then further examined, based on the same three features, by looking at their personal (Personal) and community (Community) blogs separately. Machine learning and statistical methods are used to discriminate blog contents in both cases. All three features are found to be significantly different between Autism and Control, and between autism Personal and Community. These features also show good indicative power in prediction of autism blogs in both personal and community settings.
    @ARTICLE { nguyen_duong_venkatesh_phung_tac15,
        AUTHOR = { Nguyen, Thin and Duong, Thi and Venkatesh, Svetha and Phung, Dinh },
        TITLE = { Autism Blogs: Expressed Emotion, Language Styles and Concerns in Personal and Community Settings },
        JOURNAL = { IEEE Transactions on Affective Computing (TAC) },
        YEAR = { 2015 },
        VOLUME = { 6 },
        NUMBER = { 3 },
        PAGES = { 312-323 },
        MONTH = { July },
        ISSN = { 1949-3045 },
        ABSTRACT = { The Internet has provided an ever increasingly popular platform for individuals to voice their thoughts, and like-minded people to share stories. This unintentionally leaves characteristics of individuals and communities, which are often difficult to be collected in traditional studies. Individuals with autism are such a case, in which the Internet could facilitate even more communication given its social-spatial distance being a characteristic preference for individuals with autism. Previous studies examined the traces left in the posts of online autism communities (Autism) in comparison with other online communities (Control). This work further investigates these online populations through the contents of not only their posts but also their comments. We first compare the Autism and Control blogs based on three features: topics, language styles and affective information. The autism groups are then further examined, based on the same three features, by looking at their personal (Personal) and community (Community) blogs separately. Machine learning and statistical methods are used to discriminate blog contents in both cases. All three features are found to be significantly different between Autism and Control, and between autism Personal and Community. These features also show good indicative power in prediction of autism blogs in both personal and community settings. },
        DOI = { 10.1109/TAFFC.2015.2400912 },
        FILE = { :nguyen_duong_venkatesh_phung_tac15 - Autism Blogs_ Expressed Emotion, Language Styles and Concerns in Personal and Community Settings.pdf:PDF },
        KEYWORDS = { Web sites;human factors;learning (artificial intelligence);statistical analysis;Internet;affective information;autism blogs;blog content discrimination;community setting;control blogs;language styles;machine learning;online autism communities;personal setting;social-spatial distance;statistical methods;topics;Autism;Blogs;Communities;Educational institutions;Feature extraction;Sociology;Variable speed drives;Affective norms;affective norms;autism;language styles;psychological health;topics },
        OWNER = { thinng },
        TIMESTAMP = { 2015.01.28 },
        URL = { http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7034996 },
    }
J
  • Stabilized Sparse Ordinal Regression for Medical Risk Stratification
    Tran, Truyen, Phung, Dinh, Luo, Wei and Venkatesh, Svetha. Knowledge and Information Systems (KAIS), 43(3):555-582, June 2015. [ | | pdf]
    The recent wide adoption of Electronic Medical Records (EMR) presents great opportunities and challenges for data mining. The EMR data is largely temporal, often noisy, irregular and high dimensional. This paper constructs a novel ordinal regression framework for predicting medical risk stratification from EMR. First, a conceptual view of EMR as a temporal image is constructed to extract a diverse set of features. Second, ordinal modeling is applied for predicting cumulative or progressive risk. The challenges are building a transparent predictive model that works with a large number of weakly predictive features, and at the same time, is stable against resampling variations. Our solution employs sparsity methods that are stabilized through domain-specific feature interaction networks. We introduces two indices that measure the model stability against data resampling. Feature networks are used to generate two multivariate Gaussian priors with sparse precision matrices (the Laplacian and Random Walk). We apply the framework on a large short-term suicide risk prediction problem and demonstrate that our methods outperform clinicians to a large-margin, discover suicide risk factors that conform with mental health knowledge, and produce models with enhanced stability.
    @ARTICLE { tran_phung_luo_venkatesh_kais15stabilized,
        AUTHOR = { Tran, Truyen and Phung, Dinh and Luo, Wei and Venkatesh, Svetha },
        TITLE = { Stabilized Sparse Ordinal Regression for Medical Risk Stratification },
        JOURNAL = { Knowledge and Information Systems (KAIS) },
        YEAR = { 2015 },
        VOLUME = { 43 },
        NUMBER = { 3 },
        PAGES = { 555--582 },
        MONTH = { June },
        ABSTRACT = { The recent wide adoption of Electronic Medical Records (EMR) presents great opportunities and challenges for data mining. The EMR data is largely temporal, often noisy, irregular and high dimensional. This paper constructs a novel ordinal regression framework for predicting medical risk stratification from EMR. First, a conceptual view of EMR as a temporal image is constructed to extract a diverse set of features. Second, ordinal modeling is applied for predicting cumulative or progressive risk. The challenges are building a transparent predictive model that works with a large number of weakly predictive features, and at the same time, is stable against resampling variations. Our solution employs sparsity methods that are stabilized through domain-specific feature interaction networks. We introduces two indices that measure the model stability against data resampling. Feature networks are used to generate two multivariate Gaussian priors with sparse precision matrices (the Laplacian and Random Walk). We apply the framework on a large short-term suicide risk prediction problem and demonstrate that our methods outperform clinicians to a large-margin, discover suicide risk factors that conform with mental health knowledge, and produce models with enhanced stability. },
        DOI = { 10.1007/s10115-014-0740-4 },
        FILE = { :Tran2015_Article_StabilizedSparseOrdinalRegress.pdf:PDF },
        KEYWORDS = { Medical risk stratification Sparse ordinal regression Stability Feature graph Electronic medical record },
        OWNER = { dinh },
        TIMESTAMP = { 2014.01.28 },
        URL = { http://link.springer.com/article/10.1007%2Fs10115-014-0740-4 },
    }
J
  • A predictive framework for modeling healthcare data with evolving clinical interventions
    Rana, Santu, Gupta, Sunil, Phung, Dinh and Venkatesh, Svetha. The ASA Data Science Journal Statistical Analysis and Data Mining, 8(3):162-182, June 2015. [ | | pdf]
    Medical interventions critically determine clinical outcomes. But prediction models either ignore interventions or dilute impact by building a single prediction rule by amalgamating interventions with other features. One rule across all interventions may not capture differential effects. Also, interventions change with time as innovations are made, requiring prediction models to evolve over time. To address these gaps, we propose a prediction framework that explicitly models interventions by extracting a set of latent intervention groups through a Hierarchical Dirichlet Process (HDP) mixture. Data are split in temporal windows and for each window, a separate distribution over the intervention groups is learnt. This ensures that the model evolves with changing interventions. The outcome is modeled as conditional, on both the latent grouping and the patients' condition, through a Bayesian logistic regression. Learning distributions for each time-window result in an over-complex model when interventions do not change in every time-window. We show that by replacing HDP with a dynamic HDP prior, a more compact set of distributions can be learnt. Experiments performed on two hospital datasets demonstrate the superiority of our framework over many existing clinical and traditional prediction frameworks.
    @ARTICLE { rana_gupta_phung_venkatesh_sdm15predictive,
        AUTHOR = { Rana, Santu and Gupta, Sunil and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { A predictive framework for modeling healthcare data with evolving clinical interventions },
        JOURNAL = { The ASA Data Science Journal Statistical Analysis and Data Mining },
        YEAR = { 2015 },
        VOLUME = { 8 },
        NUMBER = { 3 },
        PAGES = { 162--182 },
        MONTH = { June },
        ABSTRACT = { Medical interventions critically determine clinical outcomes. But prediction models either ignore interventions or dilute impact by building a single prediction rule by amalgamating interventions with other features. One rule across all interventions may not capture differential effects. Also, interventions change with time as innovations are made, requiring prediction models to evolve over time. To address these gaps, we propose a prediction framework that explicitly models interventions by extracting a set of latent intervention groups through a Hierarchical Dirichlet Process (HDP) mixture. Data are split in temporal windows and for each window, a separate distribution over the intervention groups is learnt. This ensures that the model evolves with changing interventions. The outcome is modeled as conditional, on both the latent grouping and the patients' condition, through a Bayesian logistic regression. Learning distributions for each time-window result in an over-complex model when interventions do not change in every time-window. We show that by replacing HDP with a dynamic HDP prior, a more compact set of distributions can be learnt. Experiments performed on two hospital datasets demonstrate the superiority of our framework over many existing clinical and traditional prediction frameworks. },
        DOI = { 10.1002/sam.11262 },
        FILE = { :rana_gupta_phung_venkatesh_sdm15predictive - A Predictive Framework for Modeling Healthcare Data with Evolving Clinical Interventions.pdf:PDF },
        KEYWORDS = { data mining, machine learning, healthcare data modeling },
        OWNER = { dinh },
        PUBLISHER = { Wiley Subscription Services, Inc., A Wiley Company },
        TIMESTAMP = { 2015.06.10 },
        URL = { http://dx.doi.org/10.1002/sam.11262 },
    }
J
  • Stabilizing High-Dimensional Prediction Models Using Feature Graphs
    Gopakumar, Shivapratap, Tran, Truyen, Nguyen, Tu, Phung, Dinh and Venkatesh, Svetha. IEEE Journal of Biomedical and Health Informatics (JBHI), 19(3):1044-1052, May 2015. [ | | pdf]
    We investigate feature stability in the context of clinical prognosis derived from high-dimensional electronic medical records. To reduce variance in the selected features that are predictive, we introduce Laplacian-based regularization into a regression model. The Laplacian is derived on a feature graph that captures both the temporal and hierarchic relations between hospital events, diseases, and interventions. Using a cohort of patients with heart failure, we demonstrate better feature stability and goodness-of-fit through feature graph stabilization.
    @ARTICLE { gopakumar_tran_nguyen_phung_venkatesh_bhi15stabilizing,
        AUTHOR = { Gopakumar, Shivapratap and Tran, Truyen and Nguyen, Tu and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { Stabilizing High-Dimensional Prediction Models Using Feature Graphs },
        JOURNAL = { IEEE Journal of Biomedical and Health Informatics (JBHI) },
        YEAR = { 2015 },
        VOLUME = { 19 },
        NUMBER = { 3 },
        PAGES = { 1044--1052 },
        MONTH = { May },
        ISSN = { 2168-2194 },
        ABSTRACT = { We investigate feature stability in the context of clinical prognosis derived from high-dimensional electronic medical records. To reduce variance in the selected features that are predictive, we introduce Laplacian-based regularization into a regression model. The Laplacian is derived on a feature graph that captures both the temporal and hierarchic relations between hospital events, diseases, and interventions. Using a cohort of patients with heart failure, we demonstrate better feature stability and goodness-of-fit through feature graph stabilization. },
        DOI = { 10.1109/JBHI.2014.2353031 },
        FILE = { :gopakumar_tran_nguyen_phung_venkatesh_bhi15stabilizing - Stabilizing High Dimensional Prediction Models Using Feature Graphs.pdf:PDF },
        KEYWORDS = { Laplace equations;cardiology;diseases;electronic health records;feature selection;graphs;medical diagnostic computing;regression analysis;Laplacian-based regularization;clinical prognosis;diseases;feature graph stabilization;goodness-of-fit;heart failure;hierarchic relations;high-dimensional electronic medical records;hospital events;interventions;regression model;selected features;stabilizing high-dimensional prediction models;temporal relations;Data models;Feature extraction;Heart;Indexes;Predictive models;Stability criteria;Biomedical computing;electronic medical records;predictive models;stability },
        OWNER = { thinng },
        TIMESTAMP = { 2015.01.29 },
        URL = { http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6887285 },
    }
J
  • A Bayesian Nonparametric Approach to Multilevel Regression
    Nguyen, V., Phung, D., Venkatesh, S. and Bui, H.H.. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pages 330-342, May 2015. [ | | pdf]
    Regression is at the cornerstone of statistical analysis. Multilevel regression, on the other hand, receives little research attention, though it is prevalent in economics, biostatistics and healthcare to name a few. We present a Bayesian nonparametric framework for multilevel regression where individuals including observations and outcomes are organized into groups. Furthermore, our approach exploits additional group-specific context observations, we use Dirichlet Process with product-space base measure in a nested structure to model group-level context distribution and the regression distribution to accommodate the multilevel structure of the data. The proposed model simultaneously partitions groups into cluster and perform regression. We provide collapsed Gibbs sampler for posterior inference. We perform extensive experiments on econometric panel data and healthcare longitudinal data to demonstrate the effectiveness of the proposed model.
    @INPROCEEDINGS { nguyen_phung_venkatesh_bui_pakdd15,
        AUTHOR = { Nguyen, V. and Phung, D. and Venkatesh, S. and Bui, H.H. },
        TITLE = { A {B}ayesian Nonparametric Approach to Multilevel Regression },
        BOOKTITLE = { Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) },
        YEAR = { 2015 },
        PAGES = { 330--342 },
        MONTH = { May },
        ABSTRACT = { Regression is at the cornerstone of statistical analysis. Multilevel regression, on the other hand, receives little research attention, though it is prevalent in economics, biostatistics and healthcare to name a few. We present a Bayesian nonparametric framework for multilevel regression where individuals including observations and outcomes are organized into groups. Furthermore, our approach exploits additional group-specific context observations, we use Dirichlet Process with product-space base measure in a nested structure to model group-level context distribution and the regression distribution to accommodate the multilevel structure of the data. The proposed model simultaneously partitions groups into cluster and perform regression. We provide collapsed Gibbs sampler for posterior inference. We perform extensive experiments on econometric panel data and healthcare longitudinal data to demonstrate the effectiveness of the proposed model. },
        DOI = { 10.1007/978-3-319-18038-0_26 },
        FILE = { :nguyen_phung_venkatesh_bui_pakdd15 - A Bayesian Nonparametric Approach to Multilevel Regression.pdf:PDF },
        OWNER = { Dinh },
        TIMESTAMP = { 2015.02.08 },
        URL = { http://link.springer.com/chapter/10.1007%2F978-3-319-18038-0_26 },
    }
C
  • Hierarchical Dirichlet Process for Tracking Complex Topical Structure Evolution and Its Application to Autism Research Literature
    Beykikhoshk, Adham, Arandjelovi{\'{c}}, Ognjen, Venkatesh, Svetha and Phung, Dinh. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pages 550-562, Ho Chi Minh City, Vietnam, May 2015. [ | | pdf]
    In this paper we describe a novel framework for the discovery of the topical content of a data corpus, and the tracking of its complex structural changes across the temporal dimension. In contrast to previous work our model does not impose a prior on the rate at which documents are added to the corpus nor does it adopt the Markovian assumption which overly restricts the type of changes that the model can capture. Our key technical contribution is a framework based on (i) discretization of time into epochs, (ii) epoch-wise topic discovery using a hierarchical Dirichlet process-based model, and (iii) a temporal similarity graph which allows for the modelling of complex topic changes: emergence and disappearance, evolution, splitting and merging. The power of the proposed framework is demonstrated on the medical literature corpus concerned with the autism spectrum disorder (ASD) – an increasingly important research subject of significant social and healthcare importance. In addition to the collected ASD literature corpus which we made freely available, our contributions also include two free online tools we built as aids to ASD researchers. These can be used for semantically meaningful navigation and searching, as well as knowledge discovery from this large and rapidly growing corpus of literature.
    @INPROCEEDINGS { beykikhoshk_arandjelovic_venkatesh_phung_pakdd15,
        AUTHOR = { Beykikhoshk, Adham and Arandjelovi{\'{c}}, Ognjen and Venkatesh, Svetha and Phung, Dinh },
        TITLE = { Hierarchical {D}irichlet Process for Tracking Complex Topical Structure Evolution and Its Application to Autism Research Literature },
        BOOKTITLE = { Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) },
        YEAR = { 2015 },
        EDITOR = { Cao, Tru and Lim, Ee-Peng and Zhou, Zhi-Hua and Ho, Tu-Bao and Cheung, David and Motoda, Hiroshi },
        VOLUME = { 9077 },
        SERIES = { Lecture Notes in Computer Science },
        PAGES = { 550--562 },
        ADDRESS = { Ho Chi Minh City, Vietnam },
        MONTH = { May },
        PUBLISHER = { Springer International Publishing },
        ABSTRACT = { In this paper we describe a novel framework for the discovery of the topical content of a data corpus, and the tracking of its complex structural changes across the temporal dimension. In contrast to previous work our model does not impose a prior on the rate at which documents are added to the corpus nor does it adopt the Markovian assumption which overly restricts the type of changes that the model can capture. Our key technical contribution is a framework based on (i) discretization of time into epochs, (ii) epoch-wise topic discovery using a hierarchical Dirichlet process-based model, and (iii) a temporal similarity graph which allows for the modelling of complex topic changes: emergence and disappearance, evolution, splitting and merging. The power of the proposed framework is demonstrated on the medical literature corpus concerned with the autism spectrum disorder (ASD) – an increasingly important research subject of significant social and healthcare importance. In addition to the collected ASD literature corpus which we made freely available, our contributions also include two free online tools we built as aids to ASD researchers. These can be used for semantically meaningful navigation and searching, as well as knowledge discovery from this large and rapidly growing corpus of literature. },
        DOI = { 10.1007/978-3-319-18038-0_43 },
        FILE = { :beykikhoshk_arandjelovic_venkatesh_phung_pakdd15 - Hierarchical Dirichlet Process for Tracking Complex Topical Structure Evolution and Its Application to Autism Research Literature.pdf:PDF },
        OWNER = { Dinh },
        TIMESTAMP = { 2015.02.08 },
        URL = { http://dx.doi.org/10.1007/978-3-319-18038-0_43 },
    }
C
  • Stabilizing Sparse Cox Model using Statistic and Semantic Structures in Electronic Medical Records
    Gopakumar, Shivapratap, Nguyen, Tu Dinh, Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pages 331-343, Ho Chi Minh City, Vietnam, May 2015. (Runner-up Best Student Paper Award). [ | | pdf]
    Stability in clinical prediction models is crucial for transferability between studies, yet has received little attention. The problem is paramount in high dimensional data, which invites sparse models with feature selection capability. We introduce an effective method to stabilize sparse Cox model of time-to-events using statistical and semantic structures inherent in Electronic Medical Records (EMR). Model estimation is stabilized using three feature graphs built from (i) Jaccard similarity among features (ii) aggregation of Jaccard similarity graph and a recently introduced semantic EMR graph (iii) Jaccard similarity among features transferred from a related cohort. Our experiments are conducted on two real world hospital datasets: a heart failure cohort and a diabetes cohort. On two stability measures – the Consistency index and signal-to-noise ratio (SNR) – the use of our proposed methods significantly increased feature stability when compared with the baselines.
    @INPROCEEDINGS { gopakumar_nguyen_tran_phung_venkatesh_pakdd15,
        AUTHOR = { Gopakumar, Shivapratap and Nguyen, Tu Dinh and Tran, Truyen and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { Stabilizing Sparse {C}ox Model using Statistic and Semantic Structures in Electronic Medical Records },
        BOOKTITLE = { Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) },
        YEAR = { 2015 },
        EDITOR = { Cao, Tru and Lim, Ee-Peng and Zhou, Zhi-Hua and Ho, Tu-Bao and Cheung, David and Motoda, Hiroshi },
        VOLUME = { 9078 },
        SERIES = { Lecture Notes in Computer Science },
        PAGES = { 331--343 },
        ADDRESS = { Ho Chi Minh City, Vietnam },
        MONTH = { May },
        PUBLISHER = { Springer International Publishing },
        NOTE = { Runner-up Best Student Paper Award },
        ABSTRACT = { Stability in clinical prediction models is crucial for transferability between studies, yet has received little attention. The problem is paramount in high dimensional data, which invites sparse models with feature selection capability. We introduce an effective method to stabilize sparse Cox model of time-to-events using statistical and semantic structures inherent in Electronic Medical Records (EMR). Model estimation is stabilized using three feature graphs built from (i) Jaccard similarity among features (ii) aggregation of Jaccard similarity graph and a recently introduced semantic EMR graph (iii) Jaccard similarity among features transferred from a related cohort. Our experiments are conducted on two real world hospital datasets: a heart failure cohort and a diabetes cohort. On two stability measures – the Consistency index and signal-to-noise ratio (SNR) – the use of our proposed methods significantly increased feature stability when compared with the baselines. },
        DOI = { 10.1007/978-3-319-18032-8_26 },
        FILE = { :gopakumar_nguyen_tran_phung_venkatesh_pakdd15 - Stabilizing Sparse Cox Model Using Statistic and Semantic Structures in Electronic Medical Records.pdf:PDF },
        OWNER = { Dinh },
        TIMESTAMP = { 2015.02.08 },
        URL = { http://link.springer.com/chapter/10.1007%2F978-3-319-18032-8_26 },
    }
C
  • Collaborating Differently on Different Topics: A Multi-Relational Approach to Multi-Task Learning
    Gupta, Sunil Kumar, Rana, Santu, Phung, Dinh and Venkatesh, Svetha. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pages 303-316, Ho Chi Minh City, Vietnam, May 2015. (Best Paper Award). [ | | pdf]
    Multi-task learning offers a way to benefit from synergy of multiple related prediction tasks via their joint modeling. Current multi-task techniques model related tasks jointly, assuming that the tasks share the same relationship across features uniformly. This assumption is seldom true as tasks may be related across some features but not others. Addressing this problem, we propose a new multi-task learning model that learns separate task relationships along different features. This added flexibility allows our model to have a finer and differential level of control in joint modeling of tasks along different features. We formulate the model as an optimization problem and provide an efficient, iterative solution. We illustrate the behavior of the proposed model using a synthetic dataset where we induce varied feature-dependent task relationships: positive relationship, negative relationship, no relationship. Using four real datasets, we evaluate the effectiveness of the proposed model for many multi-task regression and classification problems, and demonstrate its superiority over other state-of-the-art multi-task learning models.
    @INPROCEEDINGS { gupta_rana_phung_venkatesh_pakdd15,
        AUTHOR = { Gupta, Sunil Kumar and Rana, Santu and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { Collaborating Differently on Different Topics: A Multi-Relational Approach to Multi-Task Learning },
        BOOKTITLE = { Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) },
        YEAR = { 2015 },
        EDITOR = { Cao, Tru and Lim, Ee-Peng and Zhou, Zhi-Hua and Ho, Tu-Bao and Cheung, David and Motoda, Hiroshi },
        VOLUME = { 9077 },
        PAGES = { 303--316 },
        ADDRESS = { Ho Chi Minh City, Vietnam },
        MONTH = { May },
        PUBLISHER = { Springer International Publishing },
        NOTE = { Best Paper Award },
        ABSTRACT = { Multi-task learning offers a way to benefit from synergy of multiple related prediction tasks via their joint modeling. Current multi-task techniques model related tasks jointly, assuming that the tasks share the same relationship across features uniformly. This assumption is seldom true as tasks may be related across some features but not others. Addressing this problem, we propose a new multi-task learning model that learns separate task relationships along different features. This added flexibility allows our model to have a finer and differential level of control in joint modeling of tasks along different features. We formulate the model as an optimization problem and provide an efficient, iterative solution. We illustrate the behavior of the proposed model using a synthetic dataset where we induce varied feature-dependent task relationships: positive relationship, negative relationship, no relationship. Using four real datasets, we evaluate the effectiveness of the proposed model for many multi-task regression and classification problems, and demonstrate its superiority over other state-of-the-art multi-task learning models. },
        DOI = { 10.1007/978-3-319-18038-0_24 },
        FILE = { :gupta_rana_phung_venkatesh_pakdd15 - Collaborating Differently on Different Topics_ a Multi Relational Approach to Multi Task Learning.pdf:PDF },
        OWNER = { Dinh },
        TIMESTAMP = { 2015.02.08 },
        URL = { http://link.springer.com/chapter/10.1007/978-3-319-18038-0_24 },
    }
C
  • Learning Conditional Latent Structures from Multiple Data Sources
    Huynh, V., Phung, D., Nguyen, X.L., Venkatesh, S. and Bui, H.H.. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pages 343-354, May 2015. [ | | pdf]
    Data usually present in heterogeneous sources. When dealing with multiple data sources, existing models often treat them independently and thus can not explicitly model the correlation structures among data sources. To address this problem, we propose a full Bayesian nonparametric approach to model correlation structures among multiple and heterogeneous datasets. The proposed framework, first, induces mixture distribution over primary data source using hierarchical Dirichlet processes (HDP). Once conditioned on each atom (group) discovered in previous step, context data sources are mutually independent and each is generated from hierarchical Dirichlet processes. In each specific application, which covariates constitute content or context(s) is determined by the nature of data. We also derive the efficient inference and exploit the conditional independence structure to propose (conditional) parallel Gibbs sampling scheme. We demonstrate our model to address the problem of latent activities discovery in pervasive computing using mobile data. We show the advantage of utilizing multiple data sources in terms of exploratory analysis as well as quantitative clustering performance.
    @INPROCEEDINGS { huynh_phung_nguyen_venkatesh_bui_pakdd15,
        AUTHOR = { Huynh, V. and Phung, D. and Nguyen, X.L. and Venkatesh, S. and Bui, H.H. },
        TITLE = { Learning Conditional Latent Structures from Multiple Data Sources },
        BOOKTITLE = { Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) },
        YEAR = { 2015 },
        PAGES = { 343--354 },
        MONTH = { May },
        ABSTRACT = { Data usually present in heterogeneous sources. When dealing with multiple data sources, existing models often treat them independently and thus can not explicitly model the correlation structures among data sources. To address this problem, we propose a full Bayesian nonparametric approach to model correlation structures among multiple and heterogeneous datasets. The proposed framework, first, induces mixture distribution over primary data source using hierarchical Dirichlet processes (HDP). Once conditioned on each atom (group) discovered in previous step, context data sources are mutually independent and each is generated from hierarchical Dirichlet processes. In each specific application, which covariates constitute content or context(s) is determined by the nature of data. We also derive the efficient inference and exploit the conditional independence structure to propose (conditional) parallel Gibbs sampling scheme. We demonstrate our model to address the problem of latent activities discovery in pervasive computing using mobile data. We show the advantage of utilizing multiple data sources in terms of exploratory analysis as well as quantitative clustering performance. },
        DOI = { 10.1007/978-3-319-18038-0_27 },
        FILE = { :huynh_phung_nguyen_venkatesh_bui_pakdd15 - Learning Conditional Latent Structures from Multiple Data Sources.pdf:PDF },
        OWNER = { Dinh },
        TIMESTAMP = { 2015.02.08 },
        URL = { http://link.springer.com/chapter/10.1007/978-3-319-18038-0_27 },
    }
C
  • Fast One-Class Support Vector Machine for Novelty Detection
    Le, Trung, Phung, Dinh, Nguyen, Khanh and Venkatesh, Svetha. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pages 189-200, Ho Chi Minh City, Vietnam, May 2015. [ | | pdf]
    Novelty detection arises as an important learning task in several applications. Kernel-based approach to novelty detection has been widely used due to its theoretical rigor and elegance of geometric interpretation. However, computational complexity is a major obstacle in this approach. In this paper, leveraging on the cutting-plane framework with the well-known One-Class Support Vector Machine, we present a new solution that can scale up seamlessly with data. The first solution is exact and linear when viewed through the cutting-plane; the second employed a sampling strategy that remarkably has a constant computational complexity defined relatively to the probability of approximation accuracy. Several datasets are benchmarked to demonstrate the credibility of our framework.
    @INPROCEEDINGS { le_phung_nguyen_venkatesh_pakdd15,
        AUTHOR = { Le, Trung and Phung, Dinh and Nguyen, Khanh and Venkatesh, Svetha },
        TITLE = { Fast {O}ne-{C}lass {S}upport {V}ector {M}achine for Novelty Detection },
        BOOKTITLE = { Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) },
        YEAR = { 2015 },
        EDITOR = { Cao, Tru and Lim, Ee-Peng and Zhou, Zhi-Hua and Ho, Tu-Bao and Cheung, David and Motoda, Hiroshi },
        VOLUME = { 9078 },
        SERIES = { Lecture Notes in Computer Science },
        PAGES = { 189--200 },
        ADDRESS = { Ho Chi Minh City, Vietnam },
        MONTH = { May },
        PUBLISHER = { Springer International Publishing },
        ABSTRACT = { Novelty detection arises as an important learning task in several applications. Kernel-based approach to novelty detection has been widely used due to its theoretical rigor and elegance of geometric interpretation. However, computational complexity is a major obstacle in this approach. In this paper, leveraging on the cutting-plane framework with the well-known One-Class Support Vector Machine, we present a new solution that can scale up seamlessly with data. The first solution is exact and linear when viewed through the cutting-plane; the second employed a sampling strategy that remarkably has a constant computational complexity defined relatively to the probability of approximation accuracy. Several datasets are benchmarked to demonstrate the credibility of our framework. },
        DOI = { 10.1007/978-3-319-18032-8_15 },
        FILE = { :le_phung_nguyen_venkatesh_pakdd15 - Fast One Class Support Vector Machine for Novelty Detection.pdf:PDF },
        KEYWORDS = { One-class Support Vector Machine, Novelty detection, Large-scale dataset },
        OWNER = { Dinh },
        TIMESTAMP = { 2015.02.08 },
        URL = { http://link.springer.com/chapter/10.1007/978-3-319-18032-8_15 },
    }
C
  • Small-Variance Asymptotics for Bayesian Nonparametric Models with Constraints
    Li, C., Rana, S., Phung, D. and Venkatesh, S.. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pages 92-105, May 2015. [ | | pdf]
    The users often have additional knowledge when Bayesian nonparametric models (BNP) are employed, e.g. for clustering there may be prior knowledge that some of the data instances should be in the same cluster (must-link constraint) or in different clusters (cannot-link constraint), and similarly for topic modeling some words should be grouped together or separately because of an underlying semantic. This can be achieved by imposing appropriate sampling probabilities based on such constraints. However, the traditional inference technique of BNP models via Gibbs sampling is time consuming and is not scalable for large data. Variational approximations are faster but many times they do not offer good solutions. Addressing this we present a small-variance asymptotic analysis of the MAP estimates of BNP models with constraints. We derive the objective function for Dirichlet process mixture model with constraints and devise a simple and efficient K-means type algorithm. We further extend the small-variance analysis to hierarchical BNP models with constraints and devise a similar simple objective function. Experiments on synthetic and real data sets demonstrate the efficiency and effectiveness of our algorithms.
    @INPROCEEDINGS { li_rana_phung_venkatesh_pakdd15,
        AUTHOR = { Li, C. and Rana, S. and Phung, D. and Venkatesh, S. },
        TITLE = { Small-Variance Asymptotics for {B}ayesian Nonparametric Models with Constraints },
        BOOKTITLE = { Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) },
        YEAR = { 2015 },
        PAGES = { 92--105 },
        MONTH = { May },
        ABSTRACT = { The users often have additional knowledge when Bayesian nonparametric models (BNP) are employed, e.g. for clustering there may be prior knowledge that some of the data instances should be in the same cluster (must-link constraint) or in different clusters (cannot-link constraint), and similarly for topic modeling some words should be grouped together or separately because of an underlying semantic. This can be achieved by imposing appropriate sampling probabilities based on such constraints. However, the traditional inference technique of BNP models via Gibbs sampling is time consuming and is not scalable for large data. Variational approximations are faster but many times they do not offer good solutions. Addressing this we present a small-variance asymptotic analysis of the MAP estimates of BNP models with constraints. We derive the objective function for Dirichlet process mixture model with constraints and devise a simple and efficient K-means type algorithm. We further extend the small-variance analysis to hierarchical BNP models with constraints and devise a similar simple objective function. Experiments on synthetic and real data sets demonstrate the efficiency and effectiveness of our algorithms. },
        DOI = { 10.1007/978-3-319-18032-8_8 },
        FILE = { :li_rana_phung_venkatesh_pakdd15 - Small Variance Asymptotics for Bayesian Nonparametric Models with Constraints.pdf:PDF },
        OWNER = { Dinh },
        TIMESTAMP = { 2015.02.08 },
        URL = { http://link.springer.com/chapter/10.1007/978-3-319-18032-8_8 },
    }
C
  • Is Demography Destiny? Application of Machine Learning Techniques to Accurately Predict Population Health Outcomes from a Minimal Demographic Dataset
    Luo, Wei, Nguyen, Thin, Nichols, Melanie, Tran, Truyen, Rana, Santu, Gupta, Sunil, Phung, Dinh, Venkatesh, Svetha and Allender, Steve. PLOS ONE, 10(5):1-13, May 2015. [ | | pdf]
    For years, we have relied on population surveys to keep track of regional public health statistics, including the prevalence of non-communicable diseases. Because of the cost and limitations of such surveys, we often do not have the up-to-date data on health outcomes of a region. In this paper, we examined the feasibility of inferring regional health outcomes from socio-demographic data that are widely available and timely updated through national censuses and community surveys. Using data for 50 American states (excluding Washington DC) from 2007 to 2012, we constructed a machine-learning model to predict the prevalence of six non-communicable disease (NCD) outcomes (four NCDs and two major clinical risk factors), based on population socio-demographic characteristics from the American Community Survey. We found that regional prevalence estimates for non-communicable diseases can be reasonably predicted. The predictions were highly correlated with the observed data, in both the states included in the derivation model (median correlation 0.88) and those excluded from the development for use as a completely separated validation sample (median correlation 0.85), demonstrating that the model had sufficient external validity to make good predictions, based on demographics alone, for areas not included in the model development. This highlights both the utility of this sophisticated approach to model development, and the vital importance of simple socio-demographic characteristics as both indicators and determinants of chronic disease.
    @ARTICLE { luo_nguyen_nichols_tran_rana_gupta_phung_venkatesh_allender_pone15demography,
        AUTHOR = { Luo, Wei and Nguyen, Thin and Nichols, Melanie and Tran, Truyen and Rana, Santu and Gupta, Sunil and Phung, Dinh and Venkatesh, Svetha and Allender, Steve },
        TITLE = { Is Demography Destiny? Application of Machine Learning Techniques to Accurately Predict Population Health Outcomes from a Minimal Demographic Dataset },
        JOURNAL = { PLOS ONE },
        YEAR = { 2015 },
        VOLUME = { 10 },
        NUMBER = { 5 },
        PAGES = { 1-13 },
        MONTH = { May },
        ABSTRACT = { For years, we have relied on population surveys to keep track of regional public health statistics, including the prevalence of non-communicable diseases. Because of the cost and limitations of such surveys, we often do not have the up-to-date data on health outcomes of a region. In this paper, we examined the feasibility of inferring regional health outcomes from socio-demographic data that are widely available and timely updated through national censuses and community surveys. Using data for 50 American states (excluding Washington DC) from 2007 to 2012, we constructed a machine-learning model to predict the prevalence of six non-communicable disease (NCD) outcomes (four NCDs and two major clinical risk factors), based on population socio-demographic characteristics from the American Community Survey. We found that regional prevalence estimates for non-communicable diseases can be reasonably predicted. The predictions were highly correlated with the observed data, in both the states included in the derivation model (median correlation 0.88) and those excluded from the development for use as a completely separated validation sample (median correlation 0.85), demonstrating that the model had sufficient external validity to make good predictions, based on demographics alone, for areas not included in the model development. This highlights both the utility of this sophisticated approach to model development, and the vital importance of simple socio-demographic characteristics as both indicators and determinants of chronic disease. },
        DOI = { 10.1371/journal.pone.0125602 },
        FILE = { :luo_nguyen_nichols_tran_rana_gupta_phung_venkatesh_allender_pone15demography - Is Demography Destiny.pdf:PDF },
        OWNER = { dinh },
        TIMESTAMP = { 2015.06.10 },
        URL = { http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0125602 },
    }
J
  • What shall I share and with Whom? - A Multi-Task Learning Formulation using Multi-Faceted Task Relationships
    Gupta, Sunil, Rana, Santu, Phung, Dinh and Venkatesh, Svetha. In SIAM Intl. Conf. on Data Mining (SDM), pages 703-711, Vancouver, Canada, May 2015. [ | | pdf]
    Multi-task learning is a learning paradigm that improves the performance of "related" tasks through their joint learning. To do this each task answers the question "Which other task should I share with"? This task relatedness can be complex - a task may be related to one set of tasks based on one subset of features and to other tasks based on other subsets. Existing multi-task learning methods do not explicitly model this reality, learning a single-faceted task relationship over all the features. This degrades performance by forcing a task to become similar to other tasks even on their unrelated features. Addressing this gap, we propose a novel multi-task learning model that learns multi-faceted task relationship, allowing tasks to collaborate differentially on different feature subsets. This is achieved by simultaneously learning a low dimensional subspace for task parameters and inducing task groups over each latent subspace basis using a novel combination of L_{1} and pairwise L_{\infty} norms. Further, our model can induce grouping across both positively and negatively related tasks, which helps towards exploiting knowledge from all types of related tasks. We validate our model on two synthetic and five real datasets, and show significant performance improvements over several state of-the-art multi-task learning techniques. Thus our model effectively answers for each task: What shall I share and with whom?
    @INPROCEEDINGS { gupta_rana_phung_venkatesh_sdm15,
        AUTHOR = { Gupta, Sunil and Rana, Santu and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { What shall I share and with Whom? - A Multi-Task Learning Formulation using Multi-Faceted Task Relationships },
        BOOKTITLE = { SIAM Intl. Conf. on Data Mining (SDM) },
        YEAR = { 2015 },
        PAGES = { 703-711 },
        ADDRESS = { Vancouver, Canada },
        MONTH = { May },
        ABSTRACT = { Multi-task learning is a learning paradigm that improves the performance of "related" tasks through their joint learning. To do this each task answers the question "Which other task should I share with"? This task relatedness can be complex - a task may be related to one set of tasks based on one subset of features and to other tasks based on other subsets. Existing multi-task learning methods do not explicitly model this reality, learning a single-faceted task relationship over all the features. This degrades performance by forcing a task to become similar to other tasks even on their unrelated features. Addressing this gap, we propose a novel multi-task learning model that learns multi-faceted task relationship, allowing tasks to collaborate differentially on different feature subsets. This is achieved by simultaneously learning a low dimensional subspace for task parameters and inducing task groups over each latent subspace basis using a novel combination of L_{1} and pairwise L_{\infty} norms. Further, our model can induce grouping across both positively and negatively related tasks, which helps towards exploiting knowledge from all types of related tasks. We validate our model on two synthetic and five real datasets, and show significant performance improvements over several state of-the-art multi-task learning techniques. Thus our model effectively answers for each task: What shall I share and with whom? },
        DOI = { 10.1137/1.9781611974010.79 },
        FILE = { :gupta_rana_phung_venkatesh_sdm15 - What Shall I Share and with Whom_ a Multi Task Learning Formulation Using Multi Faceted Task Relationships.pdf:PDF },
        OWNER = { thinng },
        TIMESTAMP = { 2015.09.16 },
        URL = { http://epubs.siam.org/doi/abs/10.1137/1.9781611974010.79 },
    }
C
  • Learning vector representation of medical objects via EMR-driven nonnegative restricted Boltzmann machines
    Tran, Truyen, Nguyen, Tu, Phung, Dinh and Venkatesh, Svetha. Journal of Biomedical Informatics (JBI), 54:96-105, April 2015. [ | | pdf]
    Electronic medical record (EMR) offers promises for novel analytics. However, manual feature engineering from \{EMR\} is labor intensive because \{EMR\} is complex – it contains temporal, mixed-type and multimodal data packed in irregular episodes. We present a computational framework to harness \{EMR\} with minimal human supervision via restricted Boltzmann machine (RBM). The framework derives a new representation of medical objects by embedding them in a low-dimensional vector space. This new representation facilitates algebraic and statistical manipulations such as projection onto 2D plane (thereby offering intuitive visualization), object grouping (hence enabling automated phenotyping), and risk stratification. To enhance model interpretability, we introduced two constraints into model parameters: (a) nonnegative coefficients, and (b) structural smoothness. These result in a novel model called eNRBM (EMR-driven nonnegative RBM). We demonstrate the capability of the eNRBM on a cohort of 7578 mental health patients under suicide risk assessment. The derived representation not only shows clinically meaningful feature grouping but also facilitates short-term risk stratification. The F-scores, 0.21 for moderate-risk and 0.36 for high-risk, are significantly higher than those obtained by clinicians and competitive with the results obtained by support vector machines.
    @ARTICLE { tran_nguyen_phung_venkatesh_bi15learning,
        AUTHOR = { Tran, Truyen and Nguyen, Tu and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { Learning vector representation of medical objects via {EMR}-driven nonnegative restricted {B}oltzmann machines },
        JOURNAL = { Journal of Biomedical Informatics (JBI) },
        YEAR = { 2015 },
        VOLUME = { 54 },
        PAGES = { 96--105 },
        MONTH = { April },
        ABSTRACT = { Electronic medical record (EMR) offers promises for novel analytics. However, manual feature engineering from \{EMR\} is labor intensive because \{EMR\} is complex – it contains temporal, mixed-type and multimodal data packed in irregular episodes. We present a computational framework to harness \{EMR\} with minimal human supervision via restricted Boltzmann machine (RBM). The framework derives a new representation of medical objects by embedding them in a low-dimensional vector space. This new representation facilitates algebraic and statistical manipulations such as projection onto 2D plane (thereby offering intuitive visualization), object grouping (hence enabling automated phenotyping), and risk stratification. To enhance model interpretability, we introduced two constraints into model parameters: (a) nonnegative coefficients, and (b) structural smoothness. These result in a novel model called eNRBM (EMR-driven nonnegative RBM). We demonstrate the capability of the eNRBM on a cohort of 7578 mental health patients under suicide risk assessment. The derived representation not only shows clinically meaningful feature grouping but also facilitates short-term risk stratification. The F-scores, 0.21 for moderate-risk and 0.36 for high-risk, are significantly higher than those obtained by clinicians and competitive with the results obtained by support vector machines. },
        DOI = { 10.1016/j.jbi.2015.01.012 },
        FILE = { :tran_nguyen_phung_venkatesh_bi15learning - Learning Vector Representation of Medical Objects Via EMR Driven Nonnegative Restricted Boltzmann Machines.pdf:PDF },
        KEYWORDS = { Electronic medical records, Vector representation, Medical objects embedding, Feature grouping, uicide risk stratification },
        TIMESTAMP = { 2015.01.29 },
        URL = { http://www.sciencedirect.com/science/article/pii/S1532046415000143 },
    }
J
  • Topic Model Kernel Classification With Probabilistically Reduced Features
    Nguyen, Vu, Phung, Dinh and Venkatesh, Svetha. Journal of Data Science (JDS), 13(2):323-340, April 2015. [ | | pdf]
    Probabilistic topic models have become a standard in modern machine learning to deal with a wide range of applications. Representing data by dimensional reduction of mixture proportion extracted from topic models is not only richer in semantics interpretation, but could also be informative for classification tasks. In this paper, we describe the Topic Model Kernel (TMK), a topicbased kernel for Support Vector Machine classification on data being processed by probabilistic topic models. The applicability of our proposed kernelis demonstrated in several classification tasks with real world datasets. TMK outperforms existing kernels on the distributional features and give comparative results on nonprobabilistic data types.
    @ARTICLE { nguyen_phung_venkatesh_jds15,
        AUTHOR = { Nguyen, Vu and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { Topic Model Kernel Classification With Probabilistically Reduced Features },
        JOURNAL = { Journal of Data Science (JDS) },
        YEAR = { 2015 },
        VOLUME = { 13 },
        NUMBER = { 2 },
        PAGES = { 323-340 },
        MONTH = { April },
        ABSTRACT = { Probabilistic topic models have become a standard in modern machine learning to deal with a wide range of applications. Representing data by dimensional reduction of mixture proportion extracted from topic models is not only richer in semantics interpretation, but could also be informative for classification tasks. In this paper, we describe the Topic Model Kernel (TMK), a topicbased kernel for Support Vector Machine classification on data being processed by probabilistic topic models. The applicability of our proposed kernelis demonstrated in several classification tasks with real world datasets. TMK outperforms existing kernels on the distributional features and give comparative results on nonprobabilistic data types. },
        FILE = { :nguyen_phung_venkatesh_jds15 - Topic Model Kernel Classification with Probabilistically Reduced Features.pdf:PDF },
        KEYWORDS = { Topic Models, Bayesian Nonparametric, Support Vector Machine, Kernel Method, Classification, Dimensionality Reduction },
        OWNER = { thinng },
        TIMESTAMP = { 2015.01.28 },
        URL = { http://www.jds-online.com/file_download/496/6-new.pdf },
    }
J
  • Bayesian Nonparametric Approaches to Abnormality Detection in Video Surveillance
    Nguyen, Vu, Phung, Dinh, Pham, Duc-Son and Venkatesh, Svetha. Annals of Data Science (AoDS), 2(1):21-41, March 2015. [ | | pdf]
    In data science, anomaly detection is the process of identifying the items, events or observations which do not conform to expected patterns in a dataset. As widely acknowledged in the computer vision community and security management, discovering suspicious events is the key issue for abnormal detection in video surveillance. The important steps in identifying such events include stream data segmentation and hidden patterns discovery. However, the crucial challenge in stream data segmentation and hidden patterns discovery are the number of coherent segments in surveillance stream and the number of traffic patterns are unknown and hard to specify. Therefore, in this paper we revisit the abnormality detection problem through the lens of Bayesian nonparametric (BNP) and develop a novel usage of BNP methods for this problem. In particular, we employ the Infinite Hidden Markov Model and Bayesian Nonparametric Factor Analysis for stream data segmentation and pattern discovery. In addition, we introduce an interactive system allowing users to inspect and browse suspicious events.
    @ARTICLE { nguyen_phung_pham_venkatesh_aods15bayesian,
        AUTHOR = { Nguyen, Vu and Phung, Dinh and Pham, Duc-Son and Venkatesh, Svetha },
        TITLE = { {B}ayesian Nonparametric Approaches to Abnormality Detection in Video Surveillance },
        JOURNAL = { Annals of Data Science (AoDS) },
        YEAR = { 2015 },
        VOLUME = { 2 },
        NUMBER = { 1 },
        PAGES = { 21--41 },
        MONTH = { March },
        ABSTRACT = { In data science, anomaly detection is the process of identifying the items, events or observations which do not conform to expected patterns in a dataset. As widely acknowledged in the computer vision community and security management, discovering suspicious events is the key issue for abnormal detection in video surveillance. The important steps in identifying such events include stream data segmentation and hidden patterns discovery. However, the crucial challenge in stream data segmentation and hidden patterns discovery are the number of coherent segments in surveillance stream and the number of traffic patterns are unknown and hard to specify. Therefore, in this paper we revisit the abnormality detection problem through the lens of Bayesian nonparametric (BNP) and develop a novel usage of BNP methods for this problem. In particular, we employ the Infinite Hidden Markov Model and Bayesian Nonparametric Factor Analysis for stream data segmentation and pattern discovery. In addition, we introduce an interactive system allowing users to inspect and browse suspicious events. },
        DOI = { 10.1007/s40745-015-0030-3 },
        FILE = { :nguyen_phung_pham_venkatesh_aods15bayesian - Bayesian Nonparametric Approaches to Abnormality Detection in Video Surveillance.pdf:PDF },
        KEYWORDS = { Abnormal detection Bayesian nonparametric User interface Multilevel data structure Video segmentation Spatio-temporal browsing },
        OWNER = { dinh },
        PUBLISHER = { Springer Berlin Heidelberg },
        TIMESTAMP = { 2015.06.10 },
        URL = { http://link.springer.com/article/10.1007%2Fs40745-015-0030-3 },
    }
J
  • Stable feature selection for clinical prediction: Exploiting \ICD\ tree structure using Tree-Lasso
    Kamkar, Iman, Gupta, Sunil, Phung, Dinh and Venkatesh, Svetha. Journal of Biomedical Informatics (JBI), 53:277-290, Feb. 2015. [ | | pdf]
    Modern healthcare is getting reshaped by growing Electronic Medical Records (EMR). Recently, these records have been shown of great value towards building clinical prediction models. In \{EMR\} data, patients’ diseases and hospital interventions are captured through a set of diagnoses and procedures codes. These codes are usually represented in a tree form (e.g. ICD-10 tree) and the codes within a tree branch may be highly correlated. These codes can be used as features to build a prediction model and an appropriate feature selection can inform a clinician about important risk factors for a disease. Traditional feature selection methods (e.g. Information Gain, T-test, etc.) consider each variable independently and usually end up having a long feature list. Recently, Lasso and related l 1 -penalty based feature selection methods have become popular due to their joint feature selection property. However, Lasso is known to have problems of selecting one feature of many correlated features randomly. This hinders the clinicians to arrive at a stable feature set, which is crucial for clinical decision making process. In this paper, we solve this problem by using a recently proposed Tree-Lasso model. Since, the stability behavior of Tree-Lasso is not well understood, we study the stability behavior of Tree-Lasso and compare it with other feature selection methods. Using a synthetic and two real-world datasets (Cancer and Acute Myocardial Infarction), we show that Tree-Lasso based feature selection is significantly more stable than Lasso and comparable to other methods e.g. Information Gain, ReliefF and T-test. We further show that, using different types of classifiers such as logistic regression, naive Bayes, support vector machines, decision trees and Random Forest, the classification performance of Tree-Lasso is comparable to Lasso and better than other methods. Our result has implications in identifying stable risk factors for many healthcare problems and therefore can potentially assist clinical decision making for accurate medical prognosis.
    @ARTICLE { kamkar_gupta_phung_venkatesh_bi15,
        AUTHOR = { Kamkar, Iman and Gupta, Sunil and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { Stable feature selection for clinical prediction: Exploiting \{ICD\} tree structure using Tree-Lasso },
        JOURNAL = { Journal of Biomedical Informatics (JBI) },
        YEAR = { 2015 },
        VOLUME = { 53 },
        PAGES = { 277--290 },
        MONTH = { Feb. },
        ISSN = { 1532-0464 },
        ABSTRACT = { Modern healthcare is getting reshaped by growing Electronic Medical Records (EMR). Recently, these records have been shown of great value towards building clinical prediction models. In \{EMR\} data, patients’ diseases and hospital interventions are captured through a set of diagnoses and procedures codes. These codes are usually represented in a tree form (e.g. ICD-10 tree) and the codes within a tree branch may be highly correlated. These codes can be used as features to build a prediction model and an appropriate feature selection can inform a clinician about important risk factors for a disease. Traditional feature selection methods (e.g. Information Gain, T-test, etc.) consider each variable independently and usually end up having a long feature list. Recently, Lasso and related l 1 -penalty based feature selection methods have become popular due to their joint feature selection property. However, Lasso is known to have problems of selecting one feature of many correlated features randomly. This hinders the clinicians to arrive at a stable feature set, which is crucial for clinical decision making process. In this paper, we solve this problem by using a recently proposed Tree-Lasso model. Since, the stability behavior of Tree-Lasso is not well understood, we study the stability behavior of Tree-Lasso and compare it with other feature selection methods. Using a synthetic and two real-world datasets (Cancer and Acute Myocardial Infarction), we show that Tree-Lasso based feature selection is significantly more stable than Lasso and comparable to other methods e.g. Information Gain, ReliefF and T-test. We further show that, using different types of classifiers such as logistic regression, naive Bayes, support vector machines, decision trees and Random Forest, the classification performance of Tree-Lasso is comparable to Lasso and better than other methods. Our result has implications in identifying stable risk factors for many healthcare problems and therefore can potentially assist clinical decision making for accurate medical prognosis. },
        DOI = { http://dx.doi.org/10.1016/j.jbi.2014.11.013 },
        FILE = { :kamkar_gupta_phung_venkatesh_bi15 - Stable Feature Selection for Clinical Prediction_ Exploiting ICD Tree Structure Using Tree Lasso.pdf:PDF },
        KEYWORDS = { Feature selection, Lasso, Tree-Lasso, Feature stability, Classification },
        URL = { http://www.sciencedirect.com/science/article/pii/S1532046414002639 },
    }
J
  • Tree-based Iterated Local Search for Markov Random Fields with Applications in Image Analysis
    Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. Journal of Heuristics, 21(1):25-45, Feb. 2015. [ | | pdf]
    The maximum a posteriori assignment for general structure Markov random fields is computationally intractable. In this paper, we exploit tree-based methods to efficiently address this problem. Our novel method, named Tree-based Iterated Local Search (T-ILS), takes advantage of the tractability of tree-structures embedded within MRFs to derive strong local search in an ILS framework. The method efficiently explores exponentially large neighborhoods using a limited memory without any requirement on the cost functions. We evaluate the T-ILS on a simulated Ising model and two real-world vision problems: stereo matching and image denoising. Experimental results demonstrate that our methods are competitive against state-of-the-art rivals with significant computational gain.
    @ARTICLE { tran_phung_venkatesh_jh15,
        AUTHOR = { Tran, Truyen and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { Tree-based Iterated Local Search for {M}arkov {R}andom {F}ields with Applications in Image Analysis },
        JOURNAL = { Journal of Heuristics },
        YEAR = { 2015 },
        VOLUME = { 21 },
        NUMBER = { 1 },
        PAGES = { 25--45 },
        MONTH = { Feb. },
        ABSTRACT = { The maximum a posteriori assignment for general structure Markov random fields is computationally intractable. In this paper, we exploit tree-based methods to efficiently address this problem. Our novel method, named Tree-based Iterated Local Search (T-ILS), takes advantage of the tractability of tree-structures embedded within MRFs to derive strong local search in an ILS framework. The method efficiently explores exponentially large neighborhoods using a limited memory without any requirement on the cost functions. We evaluate the T-ILS on a simulated Ising model and two real-world vision problems: stereo matching and image denoising. Experimental results demonstrate that our methods are competitive against state-of-the-art rivals with significant computational gain. },
        DOI = { 10.1007/s10732-014-9270-1 },
        FILE = { :tran_phung_venkatesh_jh15 - Tree Based Iterated Local Search for Markov Random Fields with Applications in Image Analysis.pdf:PDF },
        KEYWORDS = { Iterated local search, Strong local search, Belief propagation, Markov random fields, MAP assignment },
        OWNER = { tund },
        PUBLISHER = { Springer },
        TIMESTAMP = { 2014.10.14 },
        URL = { http://link.springer.com/article/10.1007%2Fs10732-014-9270-1 },
    }
J
  • Tensor-variate Restricted Boltzmann Machines
    Nguyen, Tu Dinh, Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. In 29th AAAI Conference on Artificial Intelligence (AAAI), pages 2887-2893, Austin Texas, USA, January 2015. [ | | pdf]
    Restricted Boltzmann Machines (RBMs) are an important class of latentvariable models for representing vector data. An under-explored areais multimode data, where each data point is a matrix or a tensor.Standard RBMs applying to such data would require vectorizing matricesand tensors, thus resulting in unnecessarily high dimensionalityand at the same time, destroying the inherent higher-order interactionstructures. This paper introduces Tensor-variate Restricted BoltzmannMachines (TvRBMs) which generalize RBMs to capture the multiplicativeinteraction between data modes and the latent variables. TvRBMs arehighly compact in that the number of free parameters grows only linearwith the number of modes. We demonstrate the capacity of TvRBMs onthree real-world applications: handwritten digit classification,face recognition and EEG-based alcoholic diagnosis. The learnt featuresof the model are more discriminative than the rivals, resulting inbetter classification performance.
    @INPROCEEDINGS { nguyen_tran_phung_venkatesh_aaai15,
        AUTHOR = { Nguyen, Tu Dinh and Tran, Truyen and Phung, Dinh and Venkatesh, Svetha },
        TITLE = { Tensor-variate Restricted {B}oltzmann Machines },
        BOOKTITLE = { 29th AAAI Conference on Artificial Intelligence (AAAI) },
        YEAR = { 2015 },
        PAGES = { 2887--2893 },
        ADDRESS = { Austin Texas, USA },
        MONTH = { January },
        ABSTRACT = { Restricted Boltzmann Machines (RBMs) are an important class of latentvariable models for representing vector data. An under-explored areais multimode data, where each data point is a matrix or a tensor.Standard RBMs applying to such data would require vectorizing matricesand tensors, thus resulting in unnecessarily high dimensionalityand at the same time, destroying the inherent higher-order interactionstructures. This paper introduces Tensor-variate Restricted BoltzmannMachines (TvRBMs) which generalize RBMs to capture the multiplicativeinteraction between data modes and the latent variables. TvRBMs arehighly compact in that the number of free parameters grows only linearwith the number of modes. We demonstrate the capacity of TvRBMs onthree real-world applications: handwritten digit classification,face recognition and EEG-based alcoholic diagnosis. The learnt featuresof the model are more discriminative than the rivals, resulting inbetter classification performance. },
        FILE = { :nguyen_tran_phung_venkatesh_aaai15 - Tensor Variate Restricted Boltzmann Machines.pdf:PDF },
        KEYWORDS = { tensor; rbm; restricted boltzmann machine; tvrbm; multiplicative interaction; eeg; },
        OWNER = { ngtu },
        TIMESTAMP = { 2015.01.29 },
        URL = { http://www.aaai.org/ocs/index.php/AAAI/AAAI15/paper/download/9371/9956 },
    }
C
  • Continuous discovery of co-location contexts from Bluetooth data
    Nguyen, T., Gupta, S., Venkatesh, S. and Phung, D.. Pervasive and Mobile Computing (PMC), 16(B):286 - 304, Jan. 2015. [ | | pdf]
    The discovery of context is important for context-aware applications in pervasive computing. This problem is challenging because of the stream nature of data, the complexity and changing nature of contexts. We propose a Bayesiannonparametric model for the detection of co-location contexts from Bluetooth signals. By using an Indian buffet process as the prior distribution, the model can discover the number of contexts automatically. We introduce a novel fixed-lag particle filter that process data incrementally. This sampling scheme is especially suitable for pervasive computing as the computational requirements remain constant in spite of growing data. We examine our model on a synthetic dataset and two real world datasets. To verify the discovered contexts, we compare them to the communities detected by the Louvain method, showing a strong correlation between the results of the two methods. The fixed-lag particle filter is compared with the Gibbs sampling in terms of the normalized factorization error that shows a close performance between the two inference methods. As the fixed-lag particle filter process a small chunk of data when it comes and does not need to be restarted, its execution time is significantly shorter than that of the Gibbs sampling.
    @ARTICLE { nguyen_gupta_venkatesh_phung_pmc15,
        AUTHOR = { Nguyen, T. and Gupta, S. and Venkatesh, S. and Phung, D. },
        TITLE = { Continuous discovery of co-location contexts from {B}luetooth data },
        JOURNAL = { Pervasive and Mobile Computing (PMC) },
        YEAR = { 2015 },
        VOLUME = { 16 },
        NUMBER = { B },
        PAGES = { 286 - 304 },
        MONTH = { Jan. },
        ISSN = { 1574-1192 },
        ABSTRACT = { The discovery of context is important for context-aware applications in pervasive computing. This problem is challenging because of the stream nature of data, the complexity and changing nature of contexts. We propose a Bayesiannonparametric model for the detection of co-location contexts from Bluetooth signals. By using an Indian buffet process as the prior distribution, the model can discover the number of contexts automatically. We introduce a novel fixed-lag particle filter that process data incrementally. This sampling scheme is especially suitable for pervasive computing as the computational requirements remain constant in spite of growing data. We examine our model on a synthetic dataset and two real world datasets. To verify the discovered contexts, we compare them to the communities detected by the Louvain method, showing a strong correlation between the results of the two methods. The fixed-lag particle filter is compared with the Gibbs sampling in terms of the normalized factorization error that shows a close performance between the two inference methods. As the fixed-lag particle filter process a small chunk of data when it comes and does not need to be restarted, its execution time is significantly shorter than that of the Gibbs sampling. },
        DOI = { 10.1016/j.pmcj.2014.12.005 },
        FILE = { :nguyen_gupta_venkatesh_phung_pmc15 - Continuous Discovery of Co Location Contexts from Bluetooth Data.pdf:PDF },
        KEYWORDS = { Nonparametric, Indian buffet process, Incremental, Particle filter, Co-location context },
        OWNER = { Thuong Nguyen },
        PUBLISHER = { Elsevier },
        TIMESTAMP = { 2014.12.18 },
        URL = { http://www.sciencedirect.com/science/article/pii/S1574119214001941 },
    }
J
  • Visual Object Clustering via Mixed-Norm Regularization
    Zhang, Xin, Pham, Duc-Son, Phung, Dinh, Liu, Wanquan, Saha, Budhaditya and Venkatesh, Svetha. In Winter Conference on Applications of Computer Vision (WACV), pages 1030-1037, Jan. 2015. [ | | pdf]
    Many vision problems deal with high-dimensional data, such as motion segmentation and face clustering. However, these high-dimensional data usually lie in a low-dimensional structure. Sparse representation is a powerful principle for solving a number of clustering problems with high-dimensional data. This principle is motivated from an ideal modeling of data points according to linear algebra theory. However, real data in computer vision are unlikely to follow the ideal model perfectly. In this paper, we exploit the mixed norm regularization for sparse subspace clustering. This regularization term is a convex combination of the ℓ1 norm, which promotes sparsity at the individual level and the block norm ℓ2/1 which promotes group sparsity. Combining these powerful regularization terms will provide a more accurate modeling, subsequently leading to a better solution for the affinity matrix used in sparse subspace clustering. This could help us achieve better performance on motion segmentation and face clustering problems. This formulation also caters for different types of data corruptions. We derive a provably convergent algorithm based on the alternating direction method of multipliers (ADMM) framework, which is computationally efficient, to solve the formulation. We demonstrate that this formulation outperforms other state-of-arts on both motion segmentation and face clustering.
    @INPROCEEDINGS { zhang_pham_phung_liu_budhaditya_venkatesh_wacv15,
        AUTHOR = { Zhang, Xin and Pham, Duc-Son and Phung, Dinh and Liu, Wanquan and Saha, Budhaditya and Venkatesh, Svetha },
        TITLE = { Visual Object Clustering via Mixed-Norm Regularization },
        BOOKTITLE = { Winter Conference on Applications of Computer Vision (WACV) },
        YEAR = { 2015 },
        PAGES = { 1030--1037 },
        MONTH = { Jan. },
        ABSTRACT = { Many vision problems deal with high-dimensional data, such as motion segmentation and face clustering. However, these high-dimensional data usually lie in a low-dimensional structure. Sparse representation is a powerful principle for solving a number of clustering problems with high-dimensional data. This principle is motivated from an ideal modeling of data points according to linear algebra theory. However, real data in computer vision are unlikely to follow the ideal model perfectly. In this paper, we exploit the mixed norm regularization for sparse subspace clustering. This regularization term is a convex combination of the ℓ1 norm, which promotes sparsity at the individual level and the block norm ℓ2/1 which promotes group sparsity. Combining these powerful regularization terms will provide a more accurate modeling, subsequently leading to a better solution for the affinity matrix used in sparse subspace clustering. This could help us achieve better performance on motion segmentation and face clustering problems. This formulation also caters for different types of data corruptions. We derive a provably convergent algorithm based on the alternating direction method of multipliers (ADMM) framework, which is computationally efficient, to solve the formulation. We demonstrate that this formulation outperforms other state-of-arts on both motion segmentation and face clustering. },
        DOI = { 10.1109/WACV.2015.142 },
        FILE = { :zhang_pham_phung_liu_budhaditya_venkatesh_wacv15 - Visual Object Clustering Via Mixed Norm Regularization.pdf:PDF },
        KEYWORDS = { computer vision;image segmentation;matrix algebra;pattern clustering;alternating direction method of multipliers framework;computer vision;face clustering problems;linear algebra theory;mixed-norm regularization;motion segmentation;sparse representation;sparse subspace clustering;visual object clustering problem;Clustering algorithms;Computer vision;Data models;Educational institutions;Face;Motion segmentation;Sparse matrices },
        OWNER = { Dinh },
        TIMESTAMP = { 2015.02.03 },
        URL = { http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=7045996 },
    }
C
  • Web search activity data accurately predicts population chronic disease risk in the United States
    Nguyen, Thin, Tran, Truyen, Luo, Wei, Gupta, Sunil, Rana, Santu, Phung, Dinh, Nichols, Melanie, Millar, Lynne, Venkatesh, Svetha and Allender, Steve. Journal of Epidemiology \& Community Health, 69(7):693-699, Jan. 2015. [ | | pdf]
    Background The WHO framework for non-communicable disease (NCD) describes risks and outcomes comprising the majority of the global burden of disease. These factors are complex and interact at biological, behavioural, environmental and policy levels presenting challenges for population monitoring and intervention evaluation. This paper explores the utility of machine learning methods applied to population-level web search activity behaviour as a proxy for chronic disease risk factors.Methods Web activity output for each element of the WHO's Causes of NCD framework was used as a basis for identifying relevant web search activity from 2004 to 2013 for the USA. Multiple linear regression models with regularisation were used to generate predictive algorithms, mapping web search activity to Centers for Disease Control and Prevention (CDC) measured risk factor/disease prevalence. Predictions for subsequent target years not included in the model derivation were tested against CDC data from population surveys using Pearson correlation and Spearman's r.Results For 2011 and 2012, predicted prevalence was very strongly correlated with measured risk data ranging from fruits and vegetables consumed (r=0.81; 95% CI 0.68 to 0.89) to alcohol consumption (r=0.96; 95% CI 0.93 to 0.98). Mean difference between predicted and measured differences by State ranged from 0.03 to 2.16. Spearman's r for state-wise predicted versus measured prevalence varied from 0.82 to 0.93.Conclusions The high predictive validity of web search activity for NCD risk has potential to provide real-time information on population risk during policy implementation and other population-level NCD prevention efforts.
    @ARTICLE { nguyen_tran_luo_gupta_rana_phung_nichols_millar_venkatesh_allender_jech15,
        AUTHOR = { Nguyen, Thin and Tran, Truyen and Luo, Wei and Gupta, Sunil and Rana, Santu and Phung, Dinh and Nichols, Melanie and Millar, Lynne and Venkatesh, Svetha and Allender, Steve },
        TITLE = { Web search activity data accurately predicts population chronic disease risk in the {U}nited {S}tates },
        JOURNAL = { Journal of Epidemiology \& Community Health },
        YEAR = { 2015 },
        VOLUME = { 69 },
        NUMBER = { 7 },
        PAGES = { 693--699 },
        MONTH = { Jan. },
        ISSN = { 1949-3045 },
        ABSTRACT = { Background The WHO framework for non-communicable disease (NCD) describes risks and outcomes comprising the majority of the global burden of disease. These factors are complex and interact at biological, behavioural, environmental and policy levels presenting challenges for population monitoring and intervention evaluation. This paper explores the utility of machine learning methods applied to population-level web search activity behaviour as a proxy for chronic disease risk factors.Methods Web activity output for each element of the WHO's Causes of NCD framework was used as a basis for identifying relevant web search activity from 2004 to 2013 for the USA. Multiple linear regression models with regularisation were used to generate predictive algorithms, mapping web search activity to Centers for Disease Control and Prevention (CDC) measured risk factor/disease prevalence. Predictions for subsequent target years not included in the model derivation were tested against CDC data from population surveys using Pearson correlation and Spearman's r.Results For 2011 and 2012, predicted prevalence was very strongly correlated with measured risk data ranging from fruits and vegetables consumed (r=0.81; 95% CI 0.68 to 0.89) to alcohol consumption (r=0.96; 95% CI 0.93 to 0.98). Mean difference between predicted and measured differences by State ranged from 0.03 to 2.16. Spearman's r for state-wise predicted versus measured prevalence varied from 0.82 to 0.93.Conclusions The high predictive validity of web search activity for NCD risk has potential to provide real-time information on population risk during policy implementation and other population-level NCD prevention efforts. },
        DOI = { 10.1136/jech-2014-204523 },
        FILE = { :nguyen_tran_luo_gupta_rana_phung_nichols_millar_venkatesh_allender_jech15 - Web Search Activity Data Accurately Predicts Population Chronic Disease Risk in the United States.pdf:PDF },
        OWNER = { thinng },
        TIMESTAMP = { 2015.01.28 },
        URL = { http://jech.bmj.com/content/69/7/693.abstract },
    }
J
2014
  • A Random Finite Set Model for Data Clustering
    Phung, D. and Vo, B.N.. In Proceedings of International Conference on Fusion (FUSION), Salamanca, Spain, July 2014. [ | | pdf]
    Abstract--- The goal of data clustering is to partition data points into groups to minimize a given objective function. While most existing clustering algorithms treat each data point as vector, in many applications each datum is not a vector but a point pattern or a set of points. Moreover, many existing clustering methods require the user to specify the number of clusters, which is not available in advance. This paper proposes a new class of models for data clustering that addresses set-valued data as well as unknown number of clusters, using a Dirichlet Process mixture of Poisson random finite sets. We also develop an efficient Markov Chain Monte Carlo posterior inference technique that can learn the number of clusters and mixture parameters automatically from the data. Numerical studies are presented to demonstrate the salient features of this new model, in particular its capacity to discover extremely unbalanced clusters in data.
    @CONFERENCE { phung_vo_fusion14,
        TITLE = { A Random Finite Set Model for Data Clustering },
        AUTHOR = { Phung, D. and Vo, B.N. },
        BOOKTITLE = { Proceedings of International Conference on Fusion (FUSION) },
        YEAR = { 2014 },
        ADDRESS = { Salamanca, Spain },
        MONTH = { July },
        ABSTRACT = { Abstract--- The goal of data clustering is to partition data points into groups to minimize a given objective function. While most existing clustering algorithms treat each data point as vector, in many applications each datum is not a vector but a point pattern or a set of points. Moreover, many existing clustering methods require the user to specify the number of clusters, which is not available in advance. This paper proposes a new class of models for data clustering that addresses set-valued data as well as unknown number of clusters, using a Dirichlet Process mixture of Poisson random finite sets. We also develop an efficient Markov Chain Monte Carlo posterior inference technique that can learn the number of clusters and mixture parameters automatically from the data. Numerical studies are presented to demonstrate the salient features of this new model, in particular its capacity to discover extremely unbalanced clusters in data. },
        OWNER = { dinh },
        TIMESTAMP = { 2014.05.16 },
        URL = { http://prada-research.net/~dinh/uploads/Main/Publications/phung_vo_fusion14.pdf },
    }
C
  • Learning Latent Activities from Social Signals with Hierarchical Dirichlet Process
    Phung, D., Nguyen, T. C., Gupta, S. and Venkatesh, S.. In Handbook on Plan, Activity, and Intent Recognition, pages 149-174.Elsevier, , 2014. [ | | pdf | code]
    Understanding human activities is an important research topic, noticeably in assisted living and health monitoring. Beyond simple forms of activity (e.g., RFID event of entering a building), learning latent activities that are more semantically interpretable, such as sitting at a desk, meeting with people or gathering with friends, remains a challenging problem. Supervised learning has been the typical modeling choice in the past. However, this requires labeled training data, is unable to predict never-seen-before activity and fails to adapt to the continuing growth of data over time. In this chapter, we explore Bayesian nonparametric method, in particular the Hierarchical Dirichlet Process, to infer latent activities from sensor data acquired in a pervasive setting. Our framework is unsupervised, requires no labeled data and is able to discover new activities as data grows. We present experiments on extracting movement and interaction activities from sociometric badge signals and show how to use them for detection of sub-communities. Using the popular Reality Mining dataset, we further demonstrate the extraction of co-location activities and use them to automatically infer the structure of social subgroups.
    @INCOLLECTION { phung_nguyen_gupta_venkatesh_pair14,
        TITLE = { Learning Latent Activities from Social Signals with Hierarchical {D}irichlet Process },
        AUTHOR = { Phung, D. and Nguyen, T. C. and Gupta, S. and Venkatesh, S. },
        BOOKTITLE = { Handbook on Plan, Activity, and Intent Recognition },
        PUBLISHER = { Elsevier },
        YEAR = { 2014 },
        EDITOR = { Gita Sukthankar and Christopher Geib and David V. Pynadath and Hung Bui and Robert P. Goldman },
        PAGES = { 149--174 },
        ABSTRACT = { Understanding human activities is an important research topic, noticeably in assisted living and health monitoring. Beyond simple forms of activity (e.g., RFID event of entering a building), learning latent activities that are more semantically interpretable, such as sitting at a desk, meeting with people or gathering with friends, remains a challenging problem. Supervised learning has been the typical modeling choice in the past. However, this requires labeled training data, is unable to predict never-seen-before activity and fails to adapt to the continuing growth of data over time. In this chapter, we explore Bayesian nonparametric method, in particular the Hierarchical Dirichlet Process, to infer latent activities from sensor data acquired in a pervasive setting. Our framework is unsupervised, requires no labeled data and is able to discover new activities as data grows. We present experiments on extracting movement and interaction activities from sociometric badge signals and show how to use them for detection of sub-communities. Using the popular Reality Mining dataset, we further demonstrate the extraction of co-location activities and use them to automatically infer the structure of social subgroups. },
        CODE = { http://prada-research.net/~dinh/index.php?n=Main.Code#HDP_code },
        OWNER = { ctng },
        TIMESTAMP = { 2013.07.25 },
        URL = { http://prada-research.net/~dinh/uploads/Main/Publications/Phung_etal_pair14.pdf },
    }
BC
  • Proceedings of the Sixth Asian Conference on Machine Learning
    Phung, Dinh and Li, Hang, editor. volume 39 of JMLR Workshop and Conference Proceedings, JMLR, Nov. 2014. [ | | pdf]
    @PROCEEDINGS { phung_li_acml14proceedings,
        TITLE = { Proceedings of the Sixth Asian Conference on Machine Learning },
        YEAR = { 2014 },
        EDITOR = { Phung, Dinh and Li, Hang },
        MONTH = { Nov. },
        PUBLISHER = { JMLR },
        SERIES = { JMLR Workshop and Conference Proceedings },
        VOLUME = { 39 },
        LOCATION = { Nha Trang, Vietnam },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2016.04.11 },
        URL = { http://jmlr.org/proceedings/papers/v39/ },
    }
P
  • Bayesian Nonparametric Multilevel Clustering with Group-Level Contexts
    Nguyen, V., Phung, D., Venkatesh, S. Nguyen, X.L. and Bui, H.. In Proc. of International Conference on Machine Learning (ICML), pages 288-296, Beijing, China, 2014. [ | ]
    We present a Bayesian nonparametric framework for multilevel clustering which utilizes group-level context information to simultaneously discover low-dimensional structures of the group contents and partitions groups into clusters. Using the Dirichlet process as the building block, our model constructs a product base-measure with a nested structure to accommodate content and context observations at multiple levels. The proposed model possesses properties that link the nested Dirichlet processes (nDP) and the Dirichlet process mixture models (DPM) in an interesting way: integrating out all contents results in the DPM over contexts, whereas integrating out group-specific contexts results in the nDP mixture over content variables. We provide a Polyaurn view of the model and an efficient collapsed Gibbs inference procedure. Extensive experiments on real-world datasets demonstrate the advantage of utilizing context information via our model in both text and image domains.
    @INPROCEEDINGS { nguyen_phung_nguyen_venkatesh_bui_icml14,
        TITLE = { {B}ayesian Nonparametric Multilevel Clustering with Group-Level Contexts },
        AUTHOR = { Nguyen, V. and Phung, D. and Venkatesh, S. Nguyen, X.L. and Bui, H. },
        BOOKTITLE = { Proc. of International Conference on Machine Learning (ICML) },
        YEAR = { 2014 },
        ADDRESS = { Beijing, China },
        PAGES = { 288--296 },
        ABSTRACT = { We present a Bayesian nonparametric framework for multilevel clustering which utilizes group-level context information to simultaneously discover low-dimensional structures of the group contents and partitions groups into clusters. Using the Dirichlet process as the building block, our model constructs a product base-measure with a nested structure to accommodate content and context observations at multiple levels. The proposed model possesses properties that link the nested Dirichlet processes (nDP) and the Dirichlet process mixture models (DPM) in an interesting way: integrating out all contents results in the DPM over contexts, whereas integrating out group-specific contexts results in the nDP mixture over content variables. We provide a Polyaurn view of the model and an efficient collapsed Gibbs inference procedure. Extensive experiments on real-world datasets demonstrate the advantage of utilizing context information via our model in both text and image domains. },
        OWNER = { tvnguye },
        TIMESTAMP = { 2013.12.13 },
    }
C
  • Labeled Random Finite Sets and the Bayes Multi-target Tracking Filter
    Vo, B-N, Vo, B-T and Phung, Dinh. IEEE Transactions on Signal Processing, 62(24):6554-6567, 2014. [ | ]
    @ARTICLE { vo_vo_phung_isp14,
        TITLE = { Labeled Random Finite Sets and the Bayes Multi-target Tracking Filter },
        AUTHOR = { Vo, B-N and Vo, B-T and Phung, Dinh },
        JOURNAL = { IEEE Transactions on Signal Processing },
        YEAR = { 2014 },
        NUMBER = { 24 },
        PAGES = { 6554--6567 },
        VOLUME = { 62 },
        OWNER = { dinh },
        TIMESTAMP = { 2014.07.02 },
    }
J
  • Keeping up with Innovation: A Predictive Framework for Modeling Healthcare Data with Evolving Clinical Interventions
    Gupta, S., Rana, S., Phung, D. and Venkatesh, S.. In Proc. of SIAM Int. Conference on Data Mining (SDM) (accepted), Philadelphia, Pennsylvania, USA, April 2014. [ | ]
    @INPROCEEDINGS { gupta_rana_phung_venkatesh_sdm14,
        TITLE = { Keeping up with Innovation: A Predictive Framework for Modeling Healthcare Data with Evolving Clinical Interventions },
        AUTHOR = { Gupta, S. and Rana, S. and Phung, D. and Venkatesh, S. },
        BOOKTITLE = { Proc. of SIAM Int. Conference on Data Mining (SDM) (accepted) },
        YEAR = { 2014 },
        ADDRESS = { Philadelphia, Pennsylvania, USA },
        MONTH = { April },
        OWNER = { Thuongnc },
        TIMESTAMP = { 2014.01.05 },
    }
C
  • Stabilized Sparse Ordinal Regression for Medical Risk Stratification
    Truyen Tran, Dinh Phung, Wei Luo and Svetha Venkatesh. Knowledge and Information Systems (KAIS), 2014. [ | ]
    The recent wide adoption of Electronic Medical Records (EMR) presents great opportunities and challenges for data mining. The EMR data is largely temporal, often noisy, irregular and high dimensional. This paper constructs a novel ordinal regression framework for predicting medical risk stratification from EMR. First, a conceptual view of EMR as a temporal image is constructed to extract a diverse set of features. Second, ordinal modeling is applied for predicting cumulative or progressive risk. The challenges are building a transparent predictive model that works with a large number of weakly predictive features, and at the same time, is stable against resampling variations. Our solution employs sparsity methods that are stabilized through domain-specific feature interaction networks. We introduces two indices that measure the model stability against data resampling. Feature networks are used to generate two multivariate Gaussian priors with sparse precision matrices (the Laplacian and Random Walk). We apply the framework on a large short-term suicide risk prediction problem and demonstrate that our methods outperform clinicians to a large-margin, discover suicide risk factors that conform with mental health knowledge, and produce models with enhanced stability.
    @ARTICLE { tran_phung_luo_venkatesh_kais14,
        TITLE = { Stabilized Sparse Ordinal Regression for Medical Risk Stratification },
        AUTHOR = { Truyen Tran and Dinh Phung and Wei Luo and Svetha Venkatesh },
        JOURNAL = { Knowledge and Information Systems (KAIS) },
        YEAR = { 2014 },
        PAGES = { (accepted for publication on 17 Jan 2014) },
        ABSTRACT = { The recent wide adoption of Electronic Medical Records (EMR) presents great opportunities and challenges for data mining. The EMR data is largely temporal, often noisy, irregular and high dimensional. This paper constructs a novel ordinal regression framework for predicting medical risk stratification from EMR. First, a conceptual view of EMR as a temporal image is constructed to extract a diverse set of features. Second, ordinal modeling is applied for predicting cumulative or progressive risk. The challenges are building a transparent predictive model that works with a large number of weakly predictive features, and at the same time, is stable against resampling variations. Our solution employs sparsity methods that are stabilized through domain-specific feature interaction networks. We introduces two indices that measure the model stability against data resampling. Feature networks are used to generate two multivariate Gaussian priors with sparse precision matrices (the Laplacian and Random Walk). We apply the framework on a large short-term suicide risk prediction problem and demonstrate that our methods outperform clinicians to a large-margin, discover suicide risk factors that conform with mental health knowledge, and produce models with enhanced stability. },
        OWNER = { dinh },
        TIMESTAMP = { 2014.01.28 },
    }
J
  • Tree-based Iterated Local Search for Markov Random Fields with Applications in Image Analysis
    Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. Journal of Heuristics, 2015. [ | | pdf]
    The maximum a posteriori assignment for general structure Markov random fields is computationally intractable. In this paper, we exploit tree-based methods to efficiently address this problem. Our novel method, named Tree-based Iterated Local Search (T-ILS), takes advantage of the tractability of tree-structures embedded within MRFs to derive strong local search in an ILS framework. The method efficiently explores exponentially large neighborhoods using a limited memory without any requirement on the cost functions. We evaluate the T-ILS on a simulated Ising model and two real-world vision problems: stereo matching and image denoising. Experimental results demonstrate that our methods are competitive against state-of-the-art rivals with significant computational gain.
    @ARTICLE { tran_phung_venkatesh_jh14,
        TITLE = { Tree-based Iterated Local Search for Markov Random Fields with Applications in Image Analysis },
        AUTHOR = { Tran, Truyen and Phung, Dinh and Venkatesh, Svetha },
        JOURNAL = { Journal of Heuristics },
        YEAR = { 2015 },
        PAGES = { accepted on 8 Nov 2014 },
        ABSTRACT = { The maximum a posteriori assignment for general structure Markov random fields is computationally intractable. In this paper, we exploit tree-based methods to efficiently address this problem. Our novel method, named Tree-based Iterated Local Search (T-ILS), takes advantage of the tractability of tree-structures embedded within MRFs to derive strong local search in an ILS framework. The method efficiently explores exponentially large neighborhoods using a limited memory without any requirement on the cost functions. We evaluate the T-ILS on a simulated Ising model and two real-world vision problems: stereo matching and image denoising. Experimental results demonstrate that our methods are competitive against state-of-the-art rivals with significant computational gain. },
        OWNER = { tund },
        PUBLISHER = { Springer },
        TIMESTAMP = { 2014.10.14 },
        URL = { http://link.springer.com/article/10.1007%2Fs10732-014-9270-1 },
    }
J
  • Stabilizing Sparse Cox Model using Clinical Structures in Electronic Medical Records
    Gopakumar, Shivapratap, Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. In Proceedings of the Second International Workshop on Pattern Recognition for Healthcare Analytics, 2014. [ | | pdf]
    Stability in clinical prediction models is crucial for transferability between studies, yet has received little attention. The problem is paramount in highdimensional data which invites sparse models with feature selection capability. We introduce an effective method to stabilize sparse Cox model of time-to-events using clinical structures inherent in Electronic Medical Records (EMR). Model estimation is stabilized using a feature graph derived from two types of EMR structures: temporal structure of disease and intervention recurrences, and hierarchical structure of medical knowledge and practices. We demonstrate the efficacy of the method in predicting time-to-readmission of heart failure patients. On two stability measures – the Jaccard index and the Consistency index – the use of clinical structures significantly increased feature stability without hurting discriminative power. Our model reported a competitive AUC of 0.64 (95% CIs: [0.58,0.69]) for 6 months prediction.
    @INPROCEEDINGS { gopakumar_tran_phung_venkatesh_icpr_ws14,
        TITLE = { Stabilizing Sparse Cox Model using Clinical Structures in Electronic Medical Records },
        AUTHOR = { Gopakumar, Shivapratap and Tran, Truyen and Phung, Dinh and Venkatesh, Svetha },
        BOOKTITLE = { Proceedings of the Second International Workshop on Pattern Recognition for Healthcare Analytics },
        YEAR = { 2014 },
        ABSTRACT = { Stability in clinical prediction models is crucial for transferability between studies, yet has received little attention. The problem is paramount in highdimensional data which invites sparse models with feature selection capability. We introduce an effective method to stabilize sparse Cox model of time-to-events using clinical structures inherent in Electronic Medical Records (EMR). Model estimation is stabilized using a feature graph derived from two types of EMR structures: temporal structure of disease and intervention recurrences, and hierarchical structure of medical knowledge and practices. We demonstrate the efficacy of the method in predicting time-to-readmission of heart failure patients. On two stability measures – the Jaccard index and the Consistency index – the use of clinical structures significantly increased feature stability without hurting discriminative power. Our model reported a competitive AUC of 0.64 (95% CIs: [0.58,0.69]) for 6 months prediction. },
        URL = { https://sites.google.com/site/iwprha2/proceedings },
    }
C
  • Individualized Arrhythmia Detection with ECG Signals from Wearable Devices
    Nguyen, Thanh-Binh, Luo, Wei, Caelli, Terry, Venkatesh, Svetha and Phung, Dinh. In The 2014 International Conference on Data Science and Advanced Analytics (DSAA2014), Shanghai,China, 2014. [ | ]
    Low cost pervasive electrocardiogram (ECG) monitors is changing how sinus arrhythmia are diagnosed among patients with mild symptoms. With the large amount of data generated from long-term monitoring, come new data science and analytical challenges. Although traditional rule-based detection algorithms still work on relatively short clinical quality ECG, they are not optimal for pervasive signals collected from wearable devices—they don’t adapt to individual difference and assume accurate identification of ECG fiducial points. To overcome these short-comings of the rule-based methods, this paper introduces an arrhythmia detection approach for low quality pervasive ECG signals. To achieve the robustness needed, two techniques were applied. First, a set of ECG features with minimal reliance on fiducial point identification were selected. Next, the features were normalized using robust statistics to factors out baseline individual differences and clinically irrelevant temporal drift that is common in pervasive ECG. The proposed method was evaluated using pervasive ECG signals we collected, in combination with clinician validated ECG signals from Physiobank. Empirical evaluation confirms accuracy improvements of the proposed approach over the traditional clinical rules.
    @INPROCEEDINGS { nguyen_luo_caelli_venkatesh_phung_dsaa14,
        TITLE = { Individualized Arrhythmia Detection with ECG Signals from Wearable Devices },
        AUTHOR = { Nguyen, Thanh-Binh and Luo, Wei and Caelli, Terry and Venkatesh, Svetha and Phung, Dinh },
        BOOKTITLE = { The 2014 International Conference on Data Science and Advanced Analytics (DSAA2014) },
        YEAR = { 2014 },
        ADDRESS = { Shanghai,China },
        ABSTRACT = { Low cost pervasive electrocardiogram (ECG) monitors is changing how sinus arrhythmia are diagnosed among patients with mild symptoms. With the large amount of data generated from long-term monitoring, come new data science and analytical challenges. Although traditional rule-based detection algorithms still work on relatively short clinical quality ECG, they are not optimal for pervasive signals collected from wearable devices—they don’t adapt to individual difference and assume accurate identification of ECG fiducial points. To overcome these short-comings of the rule-based methods, this paper introduces an arrhythmia detection approach for low quality pervasive ECG signals. To achieve the robustness needed, two techniques were applied. First, a set of ECG features with minimal reliance on fiducial point identification were selected. Next, the features were normalized using robust statistics to factors out baseline individual differences and clinically irrelevant temporal drift that is common in pervasive ECG. The proposed method was evaluated using pervasive ECG signals we collected, in combination with clinician validated ECG signals from Physiobank. Empirical evaluation confirms accuracy improvements of the proposed approach over the traditional clinical rules. },
        COMMENT = { coauthor },
        OWNER = { dbdao },
        TIMESTAMP = { 2014.08.21 },
    }
C
  • Unsupervised Inference of Significant Locations from WiFi Data for Understanding Human Dynamics
    Nguyen, Thanh-Binh, Nguyen, Thuong C., Luo, Wei, Venkatesh , Svetha and Phung, Dinh. In The 13th International Conference on Mobile and Ubiquitous Multimedia (MUM2014), pages 232-235, 2014. [ | | pdf]
    Motion and location are essential to understand human dynamics. This paper presents a method to discover significant locations and daily routines of individuals from WiFi data, which is considered more suitable for study of human dynamics than GPS data. Our method determines significant locations by clustering access points in close proximity using the Affinity Propagation algorithm, which has the advantage of automatically determining the number of locations. We demonstrate our method on the MDC dataset that includes more than 30 million WiFi scans. The experimental results show good clustering performance and also superior temporal coverage in comparison to a multimodal approach on the same dataset. From the discovered location trajectories, we can learn interesting mobility patterns of mobile phone users. The human dynamics of participants is reflected through the entropy of the location distributions which shows interesting correlation with the age and occupations of users.
    @INPROCEEDINGS { nguyen_nguyen_lou_venkatesh_phung_mum14,
        TITLE = { Unsupervised Inference of Significant Locations from WiFi Data for Understanding Human Dynamics },
        AUTHOR = { Nguyen, Thanh-Binh and Nguyen, Thuong C. and Luo, Wei and Venkatesh , Svetha and Phung, Dinh },
        BOOKTITLE = { The 13th International Conference on Mobile and Ubiquitous Multimedia (MUM2014) },
        YEAR = { 2014 },
        PAGES = { 232--235 },
        ABSTRACT = { Motion and location are essential to understand human dynamics. This paper presents a method to discover significant locations and daily routines of individuals from WiFi data, which is considered more suitable for study of human dynamics than GPS data. Our method determines significant locations by clustering access points in close proximity using the Affinity Propagation algorithm, which has the advantage of automatically determining the number of locations. We demonstrate our method on the MDC dataset that includes more than 30 million WiFi scans. The experimental results show good clustering performance and also superior temporal coverage in comparison to a multimodal approach on the same dataset. From the discovered location trajectories, we can learn interesting mobility patterns of mobile phone users. The human dynamics of participants is reflected through the entropy of the location distributions which shows interesting correlation with the age and occupations of users. },
        DOI = { 2677972.2677997 },
        FILE = { :papers\\activityrecognition\\nguyen_nguyen_lou_venkatesh_phung_mum14.pdf:PDF },
        OWNER = { Thanh-Binh Nguyen },
        TIMESTAMP = { 2014.10.20 },
        URL = { http://dl.acm.org/citation.cfm?id=2677972.2677997&coll=DL&dl=ACM&CFID=590574626&CFTOKEN=81216827 },
    }
C
  • Analysis of Circadian Rhythms from Online Communities of Individuals with Affective Disorders
    Dao, Bo, Nguyen, Thin, Phung, Dinh and Venkatesh, Svetha. In The 2014 International Conference on Data Science and Advanced Analytics (DSAA2014), Shanghai,China, 2014. [ | ]
    @INPROCEEDINGS { dao_nguyen_phung_venkatesh_dsaa14,
        TITLE = { Analysis of Circadian Rhythms from Online Communities of Individuals with Affective Disorders },
        AUTHOR = { Dao, Bo and Nguyen, Thin and Phung, Dinh and Venkatesh, Svetha },
        BOOKTITLE = { The 2014 International Conference on Data Science and Advanced Analytics (DSAA2014) },
        YEAR = { 2014 },
        ADDRESS = { Shanghai,China },
        COMMENT = { coauthor },
        OWNER = { dbdao },
        TIMESTAMP = { 2014.08.21 },
    }
C
  • Topic Model Kernel Classification With Probabilistically Reduced Features
    V. Nguyen, D. Phung and S. Venkatesh. Journal of Data Science, 2014. [ | ]
    @ARTICLE { nguyen_phung_venkatesh_jds14,
        TITLE = { Topic Model Kernel Classification With Probabilistically Reduced Features },
        AUTHOR = { V. Nguyen and D. Phung and S. Venkatesh },
        JOURNAL = { Journal of Data Science },
        YEAR = { 2014 },
        PAGES = { accepted on 27/10/2014 },
        OWNER = { tvnguye },
        TIMESTAMP = { 2014.11.03 },
    }
J
  • Affective and Content Analysis of Online Depression Communities
    Thin Nguyen, Dinh Phung, Bo Dao, Svetha Venkatesh and Michael Berk. IEEE Transactions on Affective Computing, 2014. [ | | pdf]
    A large number of people use online communities to discuss mental health issues, thus offering opportunities for new understanding of these communities. This paper aims to study the characteristics of online depression communities (CLINICAL) in comparison with those joining other online communities (CONTROL). We use machine learning and statistical methods to discriminate online messages between depression and control communities using mood, psycholinguistic processes and content topics extracted from the posts generated by members of these communities. All aspects including mood, the written content and writing style are found to be significantly different between two types of communities. Sentiment analysis shows the clinical group have lower valence than people in the control group. For language styles and topics, statistical tests reject the hypothesis of equality on psycholinguistic processes and topics between two groups. We show good predictive validity in depression classification using topics and psycholinguistic clues as features. Clear discrimination between writing styles and contents, with good predictive power is an important step in understanding social media and its use in mental health.
    @ARTICLE { nguyen_phung_dao_venkatesh_berk_tac14,
        TITLE = { Affective and Content Analysis of Online Depression Communities },
        AUTHOR = { Thin Nguyen and Dinh Phung and Bo Dao and Svetha Venkatesh and Michael Berk },
        JOURNAL = { IEEE Transactions on Affective Computing },
        YEAR = { 2014 },
        PAGES = { (to appear) },
        ABSTRACT = { A large number of people use online communities to discuss mental health issues, thus offering opportunities for new understanding of these communities. This paper aims to study the characteristics of online depression communities (CLINICAL) in comparison with those joining other online communities (CONTROL). We use machine learning and statistical methods to discriminate online messages between depression and control communities using mood, psycholinguistic processes and content topics extracted from the posts generated by members of these communities. All aspects including mood, the written content and writing style are found to be significantly different between two types of communities. Sentiment analysis shows the clinical group have lower valence than people in the control group. For language styles and topics, statistical tests reject the hypothesis of equality on psycholinguistic processes and topics between two groups. We show good predictive validity in depression classification using topics and psycholinguistic clues as features. Clear discrimination between writing styles and contents, with good predictive power is an important step in understanding social media and its use in mental health. },
        DOI = { http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6784326 },
        OWNER = { thinng },
        TIMESTAMP = { 2014.03.31 },
        URL = { http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6784326 },
    }
J
  • Regularizing Topic Discovery in EMRs with Side Information by Using Hierarchical Bayesian Models
    Li, C., Rana, S. and Phung, D.and Venkatesh, S.. In Proceedings of International Conference on Pattern Recognition (ICPR) (accepted), 2014. [ | ]
    Abstract--- We propose a novel hierarchical Bayesian framework, word-distance-dependent Chinese restaurant franchise (wddCRF) for topic discovery from a document corpus regularized by side information in the form of word-to-word relations, with an application on Electronic Medical Records (EMRs). Typically, a EMRs dataset consists of several patients (documents) and each patient contains many diagnosis codes (words). We exploit the side information available in the form of a semantic tree structure among the diagnosis codes for semantically-coherent disease topic discovery. We introduce novel functions to compute word-to-word distances when side information is available in the form of tree structures. We derive an efficient inference method for the wddCRF using MCMC technique. We evaluate on a real world medical dataset consisting of about 1000 patients with PolyVascular disease. Compared with the popular topic analysis tool, hierarchical Dirichlet process (HDP), our model discovers topics which are superior in terms of both qualitative and quantitative measure.
    @INPROCEEDINGS { li_rana_phung_venkatesh_icpr14,
        TITLE = { Regularizing Topic Discovery in EMRs with Side Information by Using Hierarchical Bayesian Models },
        AUTHOR = { Li, C. and Rana, S. and Phung, D.and Venkatesh, S. },
        BOOKTITLE = { Proceedings of International Conference on Pattern Recognition (ICPR) (accepted) },
        YEAR = { 2014 },
        ABSTRACT = { Abstract--- We propose a novel hierarchical Bayesian framework, word-distance-dependent Chinese restaurant franchise (wddCRF) for topic discovery from a document corpus regularized by side information in the form of word-to-word relations, with an application on Electronic Medical Records (EMRs). Typically, a EMRs dataset consists of several patients (documents) and each patient contains many diagnosis codes (words). We exploit the side information available in the form of a semantic tree structure among the diagnosis codes for semantically-coherent disease topic discovery. We introduce novel functions to compute word-to-word distances when side information is available in the form of tree structures. We derive an efficient inference method for the wddCRF using MCMC technique. We evaluate on a real world medical dataset consisting of about 1000 patients with PolyVascular disease. Compared with the popular topic analysis tool, hierarchical Dirichlet process (HDP), our model discovers topics which are superior in terms of both qualitative and quantitative measure. },
        OWNER = { chengl },
        TIMESTAMP = { 2014.03.27 },
    }
C
  • Effect of Mood, Social Connectivity and Age in Online Depression Community via Topic and Linguistic Analysis
    Dao, Bo, Nguyen, Thin, Phung, Dinh and Venkatesh, Svetha. In International Conference on Web Information System Engineering (WISE 2014), Thessaloniki, Greece, 2014. [ | | pdf?]
    Depression afflicts one in four people during their lives. Several studies have shown that for the isolated and mentally ill, the Web and social media provide effective platforms for supports and treatments as well as to acquire scientific, clinical understanding of this mental condition. More and more individuals affected by depression join online communities to seek for information, express themselves, share their concerns and look for supports [11]. For the first time, we collect and study a large online depression community of more than 12,000 active members from Live Journal. We examine the effect of mood, social connectivity and age on the online messages authored by members in an online depression community. The posts are considered in two aspects: what is written (topic) and how it is written (language style). We use statistical and machine learning methods to discriminate the posts made by bloggers in low versus high valence mood, in different age categories and in different degrees of social connectivity. Using statistical tests, language styles are found to be significantly different between low and high valence cohorts, whilst topics are significantly different between people whose different degrees of social connectivity. High performance is achieved for low versus high valence post classification using writing style as features. The finding suggests the potential of using social media in depression screening, especially in online setting.
    @INPROCEEDINGS { dao_nguyen_phung_venkatesh_wise14,
        TITLE = { Effect of Mood, Social Connectivity and Age in Online Depression Community via Topic and Linguistic Analysis },
        AUTHOR = { Dao, Bo and Nguyen, Thin and Phung, Dinh and Venkatesh, Svetha },
        BOOKTITLE = { International Conference on Web Information System Engineering (WISE 2014) },
        YEAR = { 2014 },
        ADDRESS = { Thessaloniki, Greece },
        ABSTRACT = { Depression afflicts one in four people during their lives. Several studies have shown that for the isolated and mentally ill, the Web and social media provide effective platforms for supports and treatments as well as to acquire scientific, clinical understanding of this mental condition. More and more individuals affected by depression join online communities to seek for information, express themselves, share their concerns and look for supports [11]. For the first time, we collect and study a large online depression community of more than 12,000 active members from Live Journal. We examine the effect of mood, social connectivity and age on the online messages authored by members in an online depression community. The posts are considered in two aspects: what is written (topic) and how it is written (language style). We use statistical and machine learning methods to discriminate the posts made by bloggers in low versus high valence mood, in different age categories and in different degrees of social connectivity. Using statistical tests, language styles are found to be significantly different between low and high valence cohorts, whilst topics are significantly different between people whose different degrees of social connectivity. High performance is achieved for low versus high valence post classification using writing style as features. The finding suggests the potential of using social media in depression screening, especially in online setting. },
        COMMENT = { coauthor },
        OWNER = { dbdao },
        TIMESTAMP = { 2014.07.11 },
        URL = { 2014\conferences\dao_nguyen_phung_venkatesh_wise14.pdf },
    }
C
  • Affective, Linguistic and Topic Patterns in Online Autism Communities
    Nguyen, Thin, Duong, Thi, Phung, Dinh and Venkatesh, Svetha. In International Conference on Web Information System Engineering (WISE 2014), Thessaloniki, Greece, 2014. [ | | pdf?]
    Online communities offer a platform to support and discuss health issues. They provide a more accessible way to bring people of the same concerns or interests. This paper aims to study the characteristics of online autism communities (Clinical) in comparison with other online communities (Control) using data from 110 Live Journal weblog communities. Using machine learning techniques, we analyze these online autism communities comprehensively, studying three key aspects expressed in the blog posts made by members of the communities: sentiment, topics and language style. Sentiment analysis shows that the sentiment of the clinical group has lower valence, indicative of poorer moods than people in control. Topics and language style are shown to be good predictors of autism posts. The result shows the potential of social media in medical studies for a broad range of purposes such as screening, monitoring and subsequently providing supports for fragile communities.
    @INPROCEEDINGS { nguyen_duong_phung_venkatesh_wise14,
        TITLE = { Affective, Linguistic and Topic Patterns in Online Autism Communities },
        AUTHOR = { Nguyen, Thin and Duong, Thi and Phung, Dinh and Venkatesh, Svetha },
        BOOKTITLE = { International Conference on Web Information System Engineering (WISE 2014) },
        YEAR = { 2014 },
        ADDRESS = { Thessaloniki, Greece },
        ABSTRACT = { Online communities offer a platform to support and discuss health issues. They provide a more accessible way to bring people of the same concerns or interests. This paper aims to study the characteristics of online autism communities (Clinical) in comparison with other online communities (Control) using data from 110 Live Journal weblog communities. Using machine learning techniques, we analyze these online autism communities comprehensively, studying three key aspects expressed in the blog posts made by members of the communities: sentiment, topics and language style. Sentiment analysis shows that the sentiment of the clinical group has lower valence, indicative of poorer moods than people in control. Topics and language style are shown to be good predictors of autism posts. The result shows the potential of social media in medical studies for a broad range of purposes such as screening, monitoring and subsequently providing supports for fragile communities. },
        COMMENT = { coauthor },
        OWNER = { dbdao },
        TIMESTAMP = { 2014.07.11 },
        URL = { 2014\conferences\nguyen_duong_phung_venkatesh_wise14.pdf },
    }
C
  • A Bayesian Nonparametric Framework for Activity Recognition using Accelerometer Data
    Nguyen, T.C., Gupta, S., Venkatesh, S. and Phung, D.. In Proceedings of 22nd International Conference on Pattern Recognition (ICPR), pages 2017-2022, 2014. [ | ]
    Monitoring daily physical activity of human plays an important role in preventing the diseases as well as improving health. In this paper, we demonstrate a framework for monitoring the physical activity level in daily life. We collect the data using accelerometer sensors in a realistic setting without any supervision. The ground truth of activities is provided by the participants themselves using an experience sampling application running on mobile phones. The original data is discretized by the hierarchical Dirichlet process (HDP) into different activity levels and the number of levels are inferred automatically. We validate the accuracy of the extracted patterns by using them for the multi-label classification of activities and demonstrate high performances in various standard evaluation metrics. We further show that the extracted patterns are highly correlated to the daily routine of the users.
    @INPROCEEDINGS { nguyen_gupta_venkatesh_phung_icpr14,
        TITLE = { A {B}ayesian Nonparametric Framework for Activity Recognition using Accelerometer Data },
        AUTHOR = { Nguyen, T.C. and Gupta, S. and Venkatesh, S. and Phung, D. },
        BOOKTITLE = { Proceedings of 22nd International Conference on Pattern Recognition (ICPR) },
        YEAR = { 2014 },
        PAGES = { 2017--2022 },
        ABSTRACT = { Monitoring daily physical activity of human plays an important role in preventing the diseases as well as improving health. In this paper, we demonstrate a framework for monitoring the physical activity level in daily life. We collect the data using accelerometer sensors in a realistic setting without any supervision. The ground truth of activities is provided by the participants themselves using an experience sampling application running on mobile phones. The original data is discretized by the hierarchical Dirichlet process (HDP) into different activity levels and the number of levels are inferred automatically. We validate the accuracy of the extracted patterns by using them for the multi-label classification of activities and demonstrate high performances in various standard evaluation metrics. We further show that the extracted patterns are highly correlated to the daily routine of the users. },
        OWNER = { ctng },
        TIMESTAMP = { 2014.02.21 },
    }
C
  • Nonparametric Discovery of Learning Patterns and Autism Subgroups from Therapeutic Data
    Vellanki, P., Duong, T., Venkatesh, S. and Phung, D.. In Proceedings of 22nd International Conference on Pattern Recognition (ICPR) (accepted), pages 1829-1833, 2014. [ | ]
    Autism Spectrum Disorder (ASD) is growing at a staggering rate; but, little is known about the cause of this condition. Inferring learning patterns from therapeutic performance data, and subsequently clustering ASD children into subgroups, is important to understand this domain, and more importantly to inform evidence-based intervention. However, this data-driven task was difficult in the past due to the insufficiency of data to perform reliable analysis. For the first time, using data from a recent application for early intervention in autism (TOBY Playpad), whose download count is now exceeding 4500, we present in this paper the automatic discovery of learning patterns across 32 skills in sensory, imitation and language. We use unsupervised learning methods for this task, but a notorious problem with existing methods is correct specification of number of patterns in advance, which in our case is even more difficulty due to complexity of the data. To this end, we appeal to recent Bayesian nonparametric methods, in particular which use Bayesian Nonparametric Factor Analysis. This model uses Indian Buffet Process (IBP) as prior on a binary matrix of infinite columns to allocate groups of intervention skills to children. The optimal number of learning patterns as well as subgroup assignments are inferred automatically from data. Our experimental results follow an exploratory approach, present different newly discovered learning patterns. To provide quantitative results, we also report the clustering evaluation against K-means and NMF. In addition to the novelty of this new problem, we were able to demonstrate the suitability of Bayesian nonparametric models over parametric rivals.
    @INPROCEEDINGS { vellanki_duong_venkatesh_phung_icpr14,
        TITLE = { Nonparametric Discovery of Learning Patterns and Autism Subgroups from Therapeutic Data },
        AUTHOR = { Vellanki, P. and Duong, T. and Venkatesh, S. and Phung, D. },
        BOOKTITLE = { Proceedings of 22nd International Conference on Pattern Recognition (ICPR) (accepted) },
        YEAR = { 2014 },
        PAGES = { 1829-1833 },
        ABSTRACT = { Autism Spectrum Disorder (ASD) is growing at a staggering rate; but, little is known about the cause of this condition. Inferring learning patterns from therapeutic performance data, and subsequently clustering ASD children into subgroups, is important to understand this domain, and more importantly to inform evidence-based intervention. However, this data-driven task was difficult in the past due to the insufficiency of data to perform reliable analysis. For the first time, using data from a recent application for early intervention in autism (TOBY Playpad), whose download count is now exceeding 4500, we present in this paper the automatic discovery of learning patterns across 32 skills in sensory, imitation and language. We use unsupervised learning methods for this task, but a notorious problem with existing methods is correct specification of number of patterns in advance, which in our case is even more difficulty due to complexity of the data. To this end, we appeal to recent Bayesian nonparametric methods, in particular which use Bayesian Nonparametric Factor Analysis. This model uses Indian Buffet Process (IBP) as prior on a binary matrix of infinite columns to allocate groups of intervention skills to children. The optimal number of learning patterns as well as subgroup assignments are inferred automatically from data. Our experimental results follow an exploratory approach, present different newly discovered learning patterns. To provide quantitative results, we also report the clustering evaluation against K-means and NMF. In addition to the novelty of this new problem, we were able to demonstrate the suitability of Bayesian nonparametric models over parametric rivals. },
        OWNER = { pvellank },
        TIMESTAMP = { 2014.04.11 },
    }
C
  • Risk stratification using data from electronic medical records better predicts suicide risks than clinician assessments
    T. Tran, W. Luo, D. Phung, H. Richard, M. Berk, L. Kennedy and S. Venkatesh. BMC Psychiatry, 14(1):76, 2014. [ | | pdf]
    Background To date, our ability to accurately identify patients at high risk from suicidal behaviour, and thus to target interventions, has been fairly limited. This study examined a large pool of factors that are potentially associated with suicide risk from the comprehensive electronic medical record (EMR) and to derive a predictive model for 1–6 month risk. Methods 7,399 patients undergoing suicide risk assessment were followed up for 180 days. The dataset was divided into a derivation and validation cohorts of 4,911 and 2,488 respectively. Clinicians used an 18-point checklist of known risk factors to divide patients into low, medium, or high risk. Their predictive ability was compared with a risk stratification model derived from the EMR data. The model was based on the continuation-ratio ordinal regression method coupled with lasso (which stands for least absolute shrinkage and selection operator). Results In the year prior to suicide assessment, 66.8% of patients attended the emergency department (ED) and 41.8% had at least one hospital admission. Administrative and demographic data, along with information on prior self-harm episodes, as well as mental and physical health diagnoses were predictive of high-risk suicidal behaviour. Clinicians using the 18-point checklist were relatively poor in predicting patients at high-risk in 3 months (AUC 0.58, 95% CIs: 0.50 – 0.66). The model derived EMR was superior (AUC 0.79, 95% CIs: 0.72 – 0.84). At specificity of 0.72 (95% CIs: 0.70-0.73) the EMR model had sensitivity of 0.70 (95% CIs: 0.56-0.83). Conclusion Predictive models applied to data from the EMR could improve risk stratification of patients presenting with potential suicidal behaviour. The predictive factors include known risks for suicide, but also other information relating to general health and health service utilisation. Keywords: Suicide risk; Electronic medical record; Predictive models
    @ARTICLE { Tran_etal_bmc14,
        TITLE = { Risk stratification using data from electronic medical records better predicts suicide risks than clinician assessments },
        AUTHOR = { T. Tran and W. Luo and D. Phung and H. Richard and M. Berk and L. Kennedy and S. Venkatesh },
        JOURNAL = { BMC Psychiatry },
        YEAR = { 2014 },
        NUMBER = { 1 },
        PAGES = { 76 },
        VOLUME = { 14 },
        ABSTRACT = { Background To date, our ability to accurately identify patients at high risk from suicidal behaviour, and thus to target interventions, has been fairly limited. This study examined a large pool of factors that are potentially associated with suicide risk from the comprehensive electronic medical record (EMR) and to derive a predictive model for 1–6 month risk. Methods 7,399 patients undergoing suicide risk assessment were followed up for 180 days. The dataset was divided into a derivation and validation cohorts of 4,911 and 2,488 respectively. Clinicians used an 18-point checklist of known risk factors to divide patients into low, medium, or high risk. Their predictive ability was compared with a risk stratification model derived from the EMR data. The model was based on the continuation-ratio ordinal regression method coupled with lasso (which stands for least absolute shrinkage and selection operator). Results In the year prior to suicide assessment, 66.8% of patients attended the emergency department (ED) and 41.8% had at least one hospital admission. Administrative and demographic data, along with information on prior self-harm episodes, as well as mental and physical health diagnoses were predictive of high-risk suicidal behaviour. Clinicians using the 18-point checklist were relatively poor in predicting patients at high-risk in 3 months (AUC 0.58, 95% CIs: 0.50 – 0.66). The model derived EMR was superior (AUC 0.79, 95% CIs: 0.72 – 0.84). At specificity of 0.72 (95% CIs: 0.70-0.73) the EMR model had sensitivity of 0.70 (95% CIs: 0.56-0.83). Conclusion Predictive models applied to data from the EMR could improve risk stratification of patients presenting with potential suicidal behaviour. The predictive factors include known risks for suicide, but also other information relating to general health and health service utilisation. Keywords: Suicide risk; Electronic medical record; Predictive models },
        OWNER = { dinh },
        PUBLISHER = { BioMed Central Ltd },
        TIMESTAMP = { 2014.03.21 },
        URL = { http://www.biomedcentral.com/1471-244X/14/76 },
    }
J
  • Machine-learning prediction of cancer survival: a retrospective study using electronic administrative records and a cancer registry
    S. Gupta, T. Tran, W. Luo, D. Phung, R.L. Kennedy, A. Broad, D. Campbell, D. Kipp, M. Singh, M. Khasraw, L. Matheson, D.M. Ashley and S. Venkatesh. BMJ Open Oncology, 4(3), 2014. [ | | pdf]
    Objectives. Using the prediction of cancer outcome as a model, we have tested the hypothesis that through analysing routinely collecteddigital data contained in an electronic administrative record (EAR), using machine-learning techniques, we could enhance conventionalmethods in predicting clinical outcomes. Setting. A regional cancer centre in Australia. Participants Disease-specific data from a purpose-built cancer registry (Evaluation of Cancer Outcomes (ECO)) from 869 patients were used to predict survival at 6, 12 and 24 months. The model was validated with data from a further 94 patients, and results compared to the assessment of five specialist oncologists. Machine-learning prediction using ECO data was compared with that using EAR and a model combining ECO and EAR data. Primary and secondary outcome measures Survival prediction accuracy in terms of the area under the receiver operating characteristic curve (AUC). Results. The ECO model yielded AUCs of 0.87 (95% CI 0.848 to 0.890) at 6 months, 0.796 (95% CI 0.774 to 0.823) at 12 months and 0.764 (95% CI 0.737 to 0.789) at 24 months. Each was slightly better than the performance of the clinician panel. The model performed consistently across a range of cancers, including rare cancers. Combining ECO a and EAR data yielded better prediction than the ECO-based model (AUCs ranging from 0.757 to 0.997 for 6 months, AUCs from 0.689 to 0.988 for 12 months and AUCs from 0.713 to 0.973 for 24 months). The best prediction was for genitourinary, head and neck, lung, skin, andupper gastrointestinal tumours. Conclusions. Machine learning applied to information from a disease-specific (cancer) database and the EAR can be used to predict clinical outcomes. Importantly, the approach described made use of digital data that is already routinely collected but underexploited by clinical health systems.
    @ARTICLE { Gupta_etal_bmj14,
        TITLE = { Machine-learning prediction of cancer survival: a retrospective study using electronic administrative records and a cancer registry },
        AUTHOR = { S. Gupta and T. Tran and W. Luo and D. Phung and R.L. Kennedy and A. Broad and D. Campbell and D. Kipp and M. Singh and M. Khasraw and L. Matheson and D.M. Ashley and S. Venkatesh },
        JOURNAL = { BMJ Open Oncology },
        YEAR = { 2014 },
        NUMBER = { 3 },
        VOLUME = { 4 },
        ABSTRACT = { Objectives. Using the prediction of cancer outcome as a model, we have tested the hypothesis that through analysing routinely collecteddigital data contained in an electronic administrative record (EAR), using machine-learning techniques, we could enhance conventionalmethods in predicting clinical outcomes. Setting. A regional cancer centre in Australia. Participants Disease-specific data from a purpose-built cancer registry (Evaluation of Cancer Outcomes (ECO)) from 869 patients were used to predict survival at 6, 12 and 24 months. The model was validated with data from a further 94 patients, and results compared to the assessment of five specialist oncologists. Machine-learning prediction using ECO data was compared with that using EAR and a model combining ECO and EAR data. Primary and secondary outcome measures Survival prediction accuracy in terms of the area under the receiver operating characteristic curve (AUC). Results. The ECO model yielded AUCs of 0.87 (95% CI 0.848 to 0.890) at 6 months, 0.796 (95% CI 0.774 to 0.823) at 12 months and 0.764 (95% CI 0.737 to 0.789) at 24 months. Each was slightly better than the performance of the clinician panel. The model performed consistently across a range of cancers, including rare cancers. Combining ECO a and EAR data yielded better prediction than the ECO-based model (AUCs ranging from 0.757 to 0.997 for 6 months, AUCs from 0.689 to 0.988 for 12 months and AUCs from 0.713 to 0.973 for 24 months). The best prediction was for genitourinary, head and neck, lung, skin, andupper gastrointestinal tumours. Conclusions. Machine learning applied to information from a disease-specific (cancer) database and the EAR can be used to predict clinical outcomes. Importantly, the approach described made use of digital data that is already routinely collected but underexploited by clinical health systems. },
        DOI = { 10.1136/bmjopen-2013-004007 },
        OWNER = { dinh },
        TIMESTAMP = { 2014.03.21 },
        URL = { http://bmjopen.bmj.com/content/4/3/e004007.abstract },
    }
J
  • Fixed-lag Particle Filter for Continuous Context Discovery Using Indian Buffet Process
    Nguyen, T. C., Gupta, S., Venkatesh, S. and Phung, D.. In 2014 IEEE International Conference on Pervasive Computing and Communications (PERCOM), pages 20-28, 2014. [ | ]
    Exploiting context from stream data in pervasive environments remains a challenge. We aim to extract proximal context from Bluetooth stream data, using an incremental, Bayesian nonparametric framework that estimates the number of contexts automatically. Unlike current approaches that can only provide final proximal grouping, our method provides proximal grouping and membership of users over time. Additionally, it provides an efficient online inference. We construct co-location matrix over time using Bluetooth data. A Poisson-exponential model is used to factorize this matrix into a factor matrix, interpreted as proximal groups, and a coefficient matrix that indicates factor usage. The coefficient matrix follows the Indian Buffet Process prior, which estimates the number of factors automatically. The non-negativity and sparsity of factors are enforced by using the exponential distribution to generate the factors. We propose a fixed-lag particle filter algorithm to process data incrementally. We compare the incremental inference (particle filter) with full batch inference (Gibbs sampling) in terms of normalized factorization error and execution time. The normalized error obtained through our incremental inference is comparable to that of full batch inference, whilst it is more than 100 times faster. The discovered factors have similar meaning to the results of the Louvain method – a popular method for community detection.
    @INPROCEEDINGS { nguyen_gupta_venkatesh_phung_percom14,
        TITLE = { Fixed-lag Particle Filter for Continuous Context Discovery Using {I}ndian Buffet Process },
        AUTHOR = { Nguyen, T. C. and Gupta, S. and Venkatesh, S. and Phung, D. },
        BOOKTITLE = { 2014 IEEE International Conference on Pervasive Computing and Communications (PerCom) },
        YEAR = { 2014 },
        PAGES = { 20--28 },
        ABSTRACT = { Exploiting context from stream data in pervasive environments remains a challenge. We aim to extract proximal context from Bluetooth stream data, using an incremental, Bayesian nonparametric framework that estimates the number of contexts automatically. Unlike current approaches that can only provide final proximal grouping, our method provides proximal grouping and membership of users over time. Additionally, it provides an efficient online inference. We construct co-location matrix over time using Bluetooth data. A Poisson-exponential model is used to factorize this matrix into a factor matrix, interpreted as proximal groups, and a coefficient matrix that indicates factor usage. The coefficient matrix follows the Indian Buffet Process prior, which estimates the number of factors automatically. The non-negativity and sparsity of factors are enforced by using the exponential distribution to generate the factors. We propose a fixed-lag particle filter algorithm to process data incrementally. We compare the incremental inference (particle filter) with full batch inference (Gibbs sampling) in terms of normalized factorization error and execution time. The normalized error obtained through our incremental inference is comparable to that of full batch inference, whilst it is more than 100 times faster. The discovered factors have similar meaning to the results of the Louvain method – a popular method for community detection. },
        FILE = { :papers\\phung\\nguyen_gupta_venkatesh_phung_percom14.pdf:PDF },
        OWNER = { ctng },
        TIMESTAMP = { 2013.12.14 },
    }
C
  • Intervention-Driven Predictive Framework for Modeling Healthcare Data
    Rana, S., Gupta, S., Phung, D. and Venkatesh, S.. In Proc. of Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) (accepted), Tainan, Taiwan, May 2014. [ | ]
    @INPROCEEDINGS { rana_gupta_phung_venkatesh_pakdd14,
        TITLE = { Intervention-Driven Predictive Framework for Modeling Healthcare Data },
        AUTHOR = { Rana, S. and Gupta, S. and Phung, D. and Venkatesh, S. },
        BOOKTITLE = { Proc. of Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) (accepted) },
        YEAR = { 2014 },
        ADDRESS = { Tainan, Taiwan },
        MONTH = { May },
        OWNER = { ctng },
        TIMESTAMP = { 2014.01.05 },
    }
C
2013
  • Connectivity, Online Social Capital and Mood: A Bayesian Nonparametric Analysis
    Phung, D., Gupta, S. K., Nguyen, T. and Venkatesh, S.. IEEE Transactions on Multimedia (TMM), 15:1316-1325, May 2013. [ | | pdf]
    Social capital indicative of community interaction and support is intrinsically linked to mental health. Increasing online presence is now the norm. Whilst social capital and its impact on social networks has been examined, its underlying connection to emotional response such as mood, has not been investigated. This paper studies this phenomena, revisiting the concept of “online social capital” in social media communities using measurable aspects of social participation and social support. We establish the link between online capital derived from social media and mood, demonstrating results for different cohorts of social capital and social connectivity. We use novel Bayesian nonparametric factor analysis to extract the shared and individual factors in mood transition across groups of users of different levels of connectivity, quantifying patterns and degree of mood transitions. Using more than 1.6 million users from LiveJournal, we show quantitatively that groups with lower social capital have fewer positive moods and more negative moods, than groups with higher social capital. We show similar effects in mood transitions. We establish a framework of how social media can be used as a barometer for mood. The significance lies in the importance of online social capital to mental well-being in overall. In establishing the link between mood and social capital in online communities, this work may suggest the foundation of new systems to monitor online mental well-being.
    @ARTICLE { phung_gupta_nguyen_venkatesh_tmm13,
        TITLE = { Connectivity, Online Social Capital and Mood: A Bayesian Nonparametric Analysis },
        AUTHOR = { Phung, D. and Gupta, S. K. and Nguyen, T. and Venkatesh, S. },
        JOURNAL = { IEEE Transactions on Multimedia (TMM) },
        YEAR = { 2013 },
        MONTH = { May },
        PAGES = { 1316-1325 },
        VOLUME = { 15 },
        ABSTRACT = { Social capital indicative of community interaction and support is intrinsically linked to mental health. Increasing online presence is now the norm. Whilst social capital and its impact on social networks has been examined, its underlying connection to emotional response such as mood, has not been investigated. This paper studies this phenomena, revisiting the concept of “online social capital” in social media communities using measurable aspects of social participation and social support. We establish the link between online capital derived from social media and mood, demonstrating results for different cohorts of social capital and social connectivity. We use novel Bayesian nonparametric factor analysis to extract the shared and individual factors in mood transition across groups of users of different levels of connectivity, quantifying patterns and degree of mood transitions. Using more than 1.6 million users from LiveJournal, we show quantitatively that groups with lower social capital have fewer positive moods and more negative moods, than groups with higher social capital. We show similar effects in mood transitions. We establish a framework of how social media can be used as a barometer for mood. The significance lies in the importance of online social capital to mental well-being in overall. In establishing the link between mood and social capital in online communities, this work may suggest the foundation of new systems to monitor online mental well-being. },
        ISSN = { 0219-1377 },
        LANGUAGE = { English },
        TIMESTAMP = { 2013.04.16 },
        URL = { http://prada-research.net/~dinh/uploads/Main/HomePage/phung_gupta_nguyen_venkatesh_tmm13.pdf },
    }
J
  • Bayesian Nonparametric Modelling of Correlated Data Sources and Applications (poster)
    Phung, D.. In International Conference on Bayesian Nonparametrics, Amsterdam, The Netherlands, June 10-14 2013. [ | | code | poster]
    When one considers realistic multimodal data, covariates are rich, and yet tend to have a natural correlation with one another; for example: tags and their associated multimedia contents; patient's demographic information, medical history and drug usage; social user's pro le and friendship network. The presence of rich and naturally correlated covariates calls for the need to model their correlation with nonparametric models, without reverting to making parametric assumptions. This paper presents a full Bayesian nonparametric approach to the problem of jointly clustering data sources and modeling their correlation. In our approach, we view context as distributions over some index space, governed by the topics discovered from the primary data source (content), and model both contents and contexts jointly. We impose a conditional structure in which contents provide the topics, upon which contexts are conditionally distributed. Distributions over topic parameters are modelled according to a Dirichlet processes (DP). Stick-breaking representation gives rise to explicit realizations of topic atoms which we use as an indexing mechanism to induce conditional random mixture distributions on the context observation spaces. Loosely speaking, we use a stochastic process, being DP, to conditionally `index' other stochastic processes. The later can be designed on any suitable family of stochastic processes to suit modelling needs or data types of contexts (such as Beta or Gaussian processes). Dirichlet process is of course an obvious choice and will be again employed in this work. In typical hierarchical Bayesian style, we also provide the model in grouped data setting, where contents and contexts appear in groups (for example, a collection of text documents or images embedded in time or space). Our model can be viewed as a generalization of the hierarchical Dirichlet process (HDP) [2] and the recent nested Dirichlet process (nDP) [1]. We develop an auxiliary conditional Gibbs sampling in which both topic and context atoms are marginalized out. We demonstrate the framework on synthesis datasets and various real large-scale datasets with an emphasis on the application perspective of the models. In particular, we demonstrate a) an application in text modelling for modelling topics which are sensitive to time using the NIPS and PNAS dataset, b) an application of the model in computer vision for inferring local and global movement patterns using the MIT dataset consisting of real video data collected at a trac scene, c) an application on medical data analysis in which we model latent aspects of diseases, their progression together with the task of re-admission prediction.
    @INPROCEEDINGS { phung_bnp13,
        TITLE = { {B}ayesian Nonparametric Modelling of Correlated Data Sources and Applications (poster) },
        AUTHOR = { Phung, D. },
        BOOKTITLE = { International Conference on Bayesian Nonparametrics },
        YEAR = { 2013 },
        ADDRESS = { Amsterdam, The Netherlands },
        MONTH = { June 10-14 },
        ABSTRACT = { When one considers realistic multimodal data, covariates are rich, and yet tend to have a natural correlation with one another; for example: tags and their associated multimedia contents; patient's demographic information, medical history and drug usage; social user's pro le and friendship network. The presence of rich and naturally correlated covariates calls for the need to model their correlation with nonparametric models, without reverting to making parametric assumptions. This paper presents a full Bayesian nonparametric approach to the problem of jointly clustering data sources and modeling their correlation. In our approach, we view context as distributions over some index space, governed by the topics discovered from the primary data source (content), and model both contents and contexts jointly. We impose a conditional structure in which contents provide the topics, upon which contexts are conditionally distributed. Distributions over topic parameters are modelled according to a Dirichlet processes (DP). Stick-breaking representation gives rise to explicit realizations of topic atoms which we use as an indexing mechanism to induce conditional random mixture distributions on the context observation spaces. Loosely speaking, we use a stochastic process, being DP, to conditionally `index' other stochastic processes. The later can be designed on any suitable family of stochastic processes to suit modelling needs or data types of contexts (such as Beta or Gaussian processes). Dirichlet process is of course an obvious choice and will be again employed in this work. In typical hierarchical Bayesian style, we also provide the model in grouped data setting, where contents and contexts appear in groups (for example, a collection of text documents or images embedded in time or space). Our model can be viewed as a generalization of the hierarchical Dirichlet process (HDP) [2] and the recent nested Dirichlet process (nDP) [1]. We develop an auxiliary conditional Gibbs sampling in which both topic and context atoms are marginalized out. We demonstrate the framework on synthesis datasets and various real large-scale datasets with an emphasis on the application perspective of the models. In particular, we demonstrate a) an application in text modelling for modelling topics which are sensitive to time using the NIPS and PNAS dataset, b) an application of the model in computer vision for inferring local and global movement patterns using the MIT dataset consisting of real video data collected at a trac scene, c) an application on medical data analysis in which we model latent aspects of diseases, their progression together with the task of re-admission prediction. },
        CODE = { http://prada-research.net/~dinh/index.php?n=Main.Code#HDP_code },
        OWNER = { dinh },
        POSTER = { http://prada-research.net/~dinh/uploads/Main/Publications/A0_poster_BNP13.pdf },
        TIMESTAMP = { 2013.03.01 },
    }
C
  • Extraction of Latent Patterns and Contexts from Social Honest Signals Using Hierarchical Dirichlet Processes
    Nguyen, Thuong, Phung, Dinh, Gupta, Sunil and Venkatesh, Svetha. In IEEE Intl. Conf. on Pervasive Computing and Communications (PERCOM), pages 47-55, 2013. [ | | pdf | code]
    A fundamental task in pervasive computing is reliable acquisition of contexts from sensor data. This is crucial to the operation of smart pervasive systems and services so that they might behave efficiently and appropriately upon a given context. Simple forms of context can often be extracted directly from raw data. Equally important, or more, is the hidden context and pattern buried inside the data, which is more challenging to discover. Most of existing approaches borrow methods and techniques from machine learning, dominantly employ parametric unsupervised learning and clustering techniques. Being parametric, a severe drawback of these methods is the requirement to specify the number of latent patterns in advance. In this paper, we explore the use of Bayesian nonparametric methods, a recent data modelling framework in machine learning, to infer latent patterns from sensor data acquired in a pervasive setting. Under this formalism, nonparametric prior distributions are used for data generative process, and thus, they allow the number of latent patterns to be learned automatically and grow with the data - as more data comes in, the model complexity can grow to explain new and unseen patterns. In particular, we make use of the hierarchical Dirichlet processes (HDP) to infer atomic activities and interaction patterns from honest signals collected from sociometric badges. We show how data from these sensors can be represented and learned with HDP. We illustrate insights into atomic patterns learned by the model and use them to achieve high-performance clustering. We also demonstrate the framework on the popular Reality Mining dataset, illustrating the ability of the model to automatically infer typical social groups in this dataset. Finally, our framework is generic and applicable to a much wider range of problems in pervasive computing where one needs to infer high-level, latent patterns and contexts from sensor data.
    @INPROCEEDINGS { nguyen_phung_gupta_venkatesh_percom13,
        AUTHOR = { Nguyen, Thuong and Phung, Dinh and Gupta, Sunil and Venkatesh, Svetha },
        TITLE = { Extraction of Latent Patterns and Contexts from Social Honest Signals Using Hierarchical {D}irichlet Processes },
        BOOKTITLE = { IEEE Intl. Conf. on Pervasive Computing and Communications (PERCOM) },
        YEAR = { 2013 },
        PAGES = { 47-55 },
        ABSTRACT = { A fundamental task in pervasive computing is reliable acquisition of contexts from sensor data. This is crucial to the operation of smart pervasive systems and services so that they might behave efficiently and appropriately upon a given context. Simple forms of context can often be extracted directly from raw data. Equally important, or more, is the hidden context and pattern buried inside the data, which is more challenging to discover. Most of existing approaches borrow methods and techniques from machine learning, dominantly employ parametric unsupervised learning and clustering techniques. Being parametric, a severe drawback of these methods is the requirement to specify the number of latent patterns in advance. In this paper, we explore the use of Bayesian nonparametric methods, a recent data modelling framework in machine learning, to infer latent patterns from sensor data acquired in a pervasive setting. Under this formalism, nonparametric prior distributions are used for data generative process, and thus, they allow the number of latent patterns to be learned automatically and grow with the data - as more data comes in, the model complexity can grow to explain new and unseen patterns. In particular, we make use of the hierarchical Dirichlet processes (HDP) to infer atomic activities and interaction patterns from honest signals collected from sociometric badges. We show how data from these sensors can be represented and learned with HDP. We illustrate insights into atomic patterns learned by the model and use them to achieve high-performance clustering. We also demonstrate the framework on the popular Reality Mining dataset, illustrating the ability of the model to automatically infer typical social groups in this dataset. Finally, our framework is generic and applicable to a much wider range of problems in pervasive computing where one needs to infer high-level, latent patterns and contexts from sensor data. },
        CODE = { http://prada-research.net/~dinh/index.php?n=Main.Code#HDP_code },
        FILE = { :nguyen_phung_gupta_venkatesh_percom13 - Extraction of Latent Patterns and Contexts from Social Honest Signals Using Hierarchical Dirichlet Processes.pdf:PDF },
        OWNER = { Phung, Dinh },
        URL = { http://prada-research.net/~dinh/uploads/Main/Publications/Nguyen_etal_percom13.pdf },
    }
C
  • Thurstonian Boltzmann Machines: Learning from Multiple Inequalities
    Truyen T., Phung D. and Venkatesh, S.. In International Conference on Machine Learning (ICML), Atlanta, USA, June 16-21 2013. [ | ]
    We introduce Thurstonian Boltzmann Machines (TBM), a unified architecture that can naturally incorporate a wide range of data inputs at the same time. It is based on the observations that many data types can be considered as being generated from a subset of underlying continuous variables, and each input value signifies a several respective inequalities. Thus learning TBM is essentially learning to make sense of a set of inequalities. The TBM supports the following types naturally: Gaussian, intervals, censored, binary, categorical, multi-categorical, ordinal, (in)-complete rank with and without ties. We demonstrate the versatility and capacity of the proposed model on three applications of very different natures namely handwritten digit recognitions, collaborative filtering and complex survey analysis.
    @INPROCEEDINGS { truyen_phung_venkatesh_icml13,
        TITLE = { {T}hurstonian {B}oltzmann Machines: Learning from Multiple Inequalities },
        AUTHOR = { Truyen T. and Phung D. and Venkatesh, S. },
        BOOKTITLE = { International Conference on Machine Learning (ICML) },
        YEAR = { 2013 },
        ADDRESS = { Atlanta, USA },
        MONTH = { June 16-21 },
        ABSTRACT = { We introduce Thurstonian Boltzmann Machines (TBM), a unified architecture that can naturally incorporate a wide range of data inputs at the same time. It is based on the observations that many data types can be considered as being generated from a subset of underlying continuous variables, and each input value signifies a several respective inequalities. Thus learning TBM is essentially learning to make sense of a set of inequalities. The TBM supports the following types naturally: Gaussian, intervals, censored, binary, categorical, multi-categorical, ordinal, (in)-complete rank with and without ties. We demonstrate the versatility and capacity of the proposed model on three applications of very different natures namely handwritten digit recognitions, collaborative filtering and complex survey analysis. },
        OWNER = { dinh },
        TIMESTAMP = { 2013.03.01 },
    }
C
  • Factorial Multi-Task Learning : A Bayesian Nonparametric Approach
    Gupta, S., Phung, D. and Venkatesh, S.. In Proceedings of International Conference on Machine Learning (ICML), Atlanta, USA, June 16-21 2013. [ | ]
    Multi-task learning is a paradigm shown to improve the performance of related tasks through their joint learning. However, for real-world data, it is usually difficult to assess the task relatedness and joint learning with unrelated tasks may lead to serious performance degradations. To this end, we propose a framework that groups the tasks based on their relatedness in a low dimensional subspace and allows a varying degree of relatedness among tasks by sharing the subspace bases across the groups. This provides the flexibility of no sharing when two sets of tasks are unrelated and partial/total sharing when the tasks are related. Importantly, the number of task-groups and the subspace dimensionality are automatically inferred from the data. This feature keeps the model beyond a specific set of parameters. To realize our framework, we present a novel Bayesian nonparametric prior that extends the traditional hierarchical beta process prior using a Dirichlet process to permit potentially infinite number of child beta processes. We apply our model for multi-task regression and classification applications. Experimental results using several synthetic and real-world datasets show the superiority of our model to other recent state-of-the-art multi-task learning methods.
    @INPROCEEDINGS { gupta_phung_venkatesh_icml13,
        TITLE = { Factorial Multi-Task Learning : A Bayesian Nonparametric Approach },
        AUTHOR = { Gupta, S. and Phung, D. and Venkatesh, S. },
        BOOKTITLE = { Proceedings of International Conference on Machine Learning (ICML) },
        YEAR = { 2013 },
        ADDRESS = { Atlanta, USA },
        MONTH = { June 16-21 },
        ABSTRACT = { Multi-task learning is a paradigm shown to improve the performance of related tasks through their joint learning. However, for real-world data, it is usually difficult to assess the task relatedness and joint learning with unrelated tasks may lead to serious performance degradations. To this end, we propose a framework that groups the tasks based on their relatedness in a low dimensional subspace and allows a varying degree of relatedness among tasks by sharing the subspace bases across the groups. This provides the flexibility of no sharing when two sets of tasks are unrelated and partial/total sharing when the tasks are related. Importantly, the number of task-groups and the subspace dimensionality are automatically inferred from the data. This feature keeps the model beyond a specific set of parameters. To realize our framework, we present a novel Bayesian nonparametric prior that extends the traditional hierarchical beta process prior using a Dirichlet process to permit potentially infinite number of child beta processes. We apply our model for multi-task regression and classification applications. Experimental results using several synthetic and real-world datasets show the superiority of our model to other recent state-of-the-art multi-task learning methods. },
        TIMESTAMP = { 2013.04.16 },
    }
C
  • An Integrated Framework for Suicide Risk Prediction
    Tran, T., Phung, D., Luo, W., Harvey,R., Berk,M. and Venkatesh, S.. In Proc. of ACM Int. Conf. on Knowledge Discovery and Data Mining (SIGKDD), Chicago, US, 2013. [ | ]
    @INPROCEEDINGS { truyen_phung_luo_harvey_berk_venkatesh_sigkdd13,
        TITLE = { An Integrated Framework for Suicide Risk Prediction },
        AUTHOR = { Tran, T. and Phung, D. and Luo, W. and Harvey,R. and Berk,M. and Venkatesh, S. },
        BOOKTITLE = { Proc. of ACM Int. Conf. on Knowledge Discovery and Data Mining (SIGKDD) },
        YEAR = { 2013 },
        ADDRESS = { Chicago, US },
        TIMESTAMP = { 2013.06.07 },
    }
C
  • Sparse Subspace Clustering via Group Sparse Coding
    Saha, B., Pham, D.S., Phung, D. and Venkatesh, S.. In Proceedings of the SIAM International Conference on Data Mining (SDM), pages 130-138, Texas, USA, May 2013. [ | ]
    Sparse subspace representation is an emerging and powerful approach for clustering of data, whose generative model is a union of subspaces. Existing sparse subspace representation methods are restricted to the single-task setting, which consequently leads to inefficient computation and sub-optimal performance. To address the current limitation, we propose a novel method that regularizes sparse subspace representation by exploiting the structural sharing between tasks and data points. The first regularizer aims at group level where we seek sparsity between groups but dense within group. The second regularizer models the interactions down to data point level via the well-known graph regularization technique. We also derive simple, provably convergent, and extremely computationally efficient algorithms for solving the proposed group formulations. We evaluate the proposed methods over a wide range of large-scale clustering problems: from challenging health care to image and text clustering benchmarks datasets and show that they outperform state-of-the-art considerably.
    @INPROCEEDINGS { saha_pham_phung_venkatesh_sdm13,
        TITLE = { Sparse Subspace Clustering via Group Sparse Coding },
        AUTHOR = { Saha, B. and Pham, D.S. and Phung, D. and Venkatesh, S. },
        BOOKTITLE = { Proceedings of the SIAM International Conference on Data Mining (SDM) },
        YEAR = { 2013 },
        ADDRESS = { Texas, USA },
        MONTH = { May },
        PAGES = { 130-138 },
        ABSTRACT = { Sparse subspace representation is an emerging and powerful approach for clustering of data, whose generative model is a union of subspaces. Existing sparse subspace representation methods are restricted to the single-task setting, which consequently leads to inefficient computation and sub-optimal performance. To address the current limitation, we propose a novel method that regularizes sparse subspace representation by exploiting the structural sharing between tasks and data points. The first regularizer aims at group level where we seek sparsity between groups but dense within group. The second regularizer models the interactions down to data point level via the well-known graph regularization technique. We also derive simple, provably convergent, and extremely computationally efficient algorithms for solving the proposed group formulations. We evaluate the proposed methods over a wide range of large-scale clustering problems: from challenging health care to image and text clustering benchmarks datasets and show that they outperform state-of-the-art considerably. },
        OWNER = { thinng },
        TIMESTAMP = { 2013.01.07 },
    }
C
  • Mood sensing from social media texts and its applications
    Nguyen, T., Phung, D., Adams, B. and Venkatesh, S.. Knowledge and Information Systems, 2013. [ | | pdf]
    We present a large-scale mood analysis in social media texts. We organize the paper in three parts: 1) addressing the problem of feature selection and classification of mood in blogosphere, 2) we extract global mood patterns at different level of aggregation from a large-scale dataset of approximately 18 millions documents 3) and finally, we extract mood trajectory for an egocentric user and study how it can be used to detect subtle emotion signals in a user-centric manner, supporting discovery of hyper-groups of communities based on sentiment information. For mood classification, two feature sets proposed in psychology are used, showing that these features are efficient, do not require a training phase and yield classification results comparable to state-of-the-art, supervised feature-selection schemes; on mood patterns, empirical results for mood organisation in the blogosphere are provided, analogous to the structure of human emotion proposed independently in the psychology literature; and on community structure discovery, sentiment-based approach can yield useful insights into community formation.
    @ARTICLE { nguyen_phung_adams_venkatesh_kais13,
        TITLE = { Mood sensing from social media texts and its applications },
        AUTHOR = { Nguyen, T. and Phung, D. and Adams, B. and Venkatesh, S. },
        JOURNAL = { Knowledge and Information Systems },
        YEAR = { 2013 },
        PAGES = { 1-36 },
        ABSTRACT = { We present a large-scale mood analysis in social media texts. We organize the paper in three parts: 1) addressing the problem of feature selection and classification of mood in blogosphere, 2) we extract global mood patterns at different level of aggregation from a large-scale dataset of approximately 18 millions documents 3) and finally, we extract mood trajectory for an egocentric user and study how it can be used to detect subtle emotion signals in a user-centric manner, supporting discovery of hyper-groups of communities based on sentiment information. For mood classification, two feature sets proposed in psychology are used, showing that these features are efficient, do not require a training phase and yield classification results comparable to state-of-the-art, supervised feature-selection schemes; on mood patterns, empirical results for mood organisation in the blogosphere are provided, analogous to the structure of human emotion proposed independently in the psychology literature; and on community structure discovery, sentiment-based approach can yield useful insights into community formation. },
        CITESEERURL = { http://prada-research.net/~dinh/uploads/Main/Publications/Nguyen_etal_13mood.pdf },
        DOI = { 10.1007/s10115-013-0628-8 },
        ISSN = { 0219-1377 },
        KEYWORDS = { Mood sensing; Mood classification; Mood pattern; Hyper-community },
        LANGUAGE = { English },
        OWNER = { thinng },
        PUBLISHER = { Springer-Verlag },
        TIMESTAMP = { 2013.04.03 },
        URL = { http://link.springer.com/article/10.1007/s10115-013-0628-8/ },
    }
J
  • TOBY: Early Intervention in Autism through Technology
    Venkatesh, S., Phung, D., Greenhill, S., Duong, T. and Adams, B.. In Proceedings of the SIGCHI Conference on Human Factors in Computing System (CHI), pages 3187-3196, Paris, France, April 2013. [ | ]
    @INPROCEEDINGS { venkatesh_phung_greenhill_duong_adams_chi13,
        TITLE = { {TOBY}: Early Intervention in Autism through Technology },
        AUTHOR = { Venkatesh, S. and Phung, D. and Greenhill, S. and Duong, T. and Adams, B. },
        BOOKTITLE = { Proceedings of the SIGCHI Conference on Human Factors in Computing System (CHI) },
        YEAR = { 2013 },
        ADDRESS = { Paris, France },
        MONTH = { April },
        PAGES = { 3187-3196 },
        OWNER = { thinng },
        TIMESTAMP = { 2013.01.07 },
    }
C
  • Topic Model Kernel: An Empirical Study Towards Probabilistically Reduced Features For Classification
    Nguyen, Tien V., Phung, D. and and Venkatesh, S.. Technical report, Pattern Recognition and Data Analytics, Deakin University, 2013. [ | | pdf]
    Probabilistic topic models have become a standard in modern machine learning with wide applications in organizing and summarizing `documents' in high-dimensional data such as images, videos, texts, gene expression data, and so on. Representing data by dimensional reduction of mixture proportion extracted from topic models is not only richer in semantics than bag-of-word interpretation, but also more informative for classification tasks. This paper describes the Topic Model Kernel (TMK), a high dimensional mapping for Support Vector Machine classification of data generated from probabilistic topic models. The applicability of our proposed kernel is demonstrated in several classification tasks from real world datasets. We outperform existing kernels on the distributional features and give the comparative results on non-probabilistic data types.
    @TECHREPORT { nguyen_phung_venkatesh_tr13,
        TITLE = { Topic Model Kernel: An Empirical Study Towards Probabilistically Reduced Features For Classification },
        AUTHOR = { Nguyen, Tien V. and Phung, D. and and Venkatesh, S. },
        INSTITUTION = { Pattern Recognition and Data Analytics, Deakin University },
        YEAR = { 2013 },
        ABSTRACT = { Probabilistic topic models have become a standard in modern machine learning with wide applications in organizing and summarizing `documents' in high-dimensional data such as images, videos, texts, gene expression data, and so on. Representing data by dimensional reduction of mixture proportion extracted from topic models is not only richer in semantics than bag-of-word interpretation, but also more informative for classification tasks. This paper describes the Topic Model Kernel (TMK), a high dimensional mapping for Support Vector Machine classification of data generated from probabilistic topic models. The applicability of our proposed kernel is demonstrated in several classification tasks from real world datasets. We outperform existing kernels on the distributional features and give the comparative results on non-probabilistic data types. },
        OWNER = { nguyen },
        TIMESTAMP = { 2013.07.01 },
        URL = { http://prada-research.net/~dinh/uploads/Main/Publications/nguyen_etal_tr13.pdf },
    }
R
  • Regularized nonnegative shared subspace learning
    Gupta, S., Phung, D., Adams, B. and Venkatesh, S.. Data Mining and Knowledge Discovery, 26(1):57-97, January 2013. [ | ]
    Joint modeling of related data sources has the potential to improve various data mining tasks such as transfer learning, multitask clustering, information retrieval etc. However, diversity among various data sources might outweigh the advantages of the joint modeling, and thus may result in performance degradations. To this end, we propose a regularized shared subspace learning framework, which can exploit the mutual strengths of related data sources while being immune to the effects of the variabilities of each source. This is achieved by further imposing a mutual orthogonality constraint on the constituent subspaces which segregates the common patterns from the source specific patterns, and thus, avoids performance degradations. Our approach is rooted in nonnegative matrix factorization and extends it further to enable joint analysis of related data sources. Experiments performed using three real world data sets for both retrieval and clustering applications demonstrate the benefits of regularization and validate the effectiveness of the model. Our proposed solution provides a formal framework appropriate for jointly analyzing related data sources and therefore, it is applicable to a wider context in data mining.
    @ARTICLE { gupta_phung_adams_venkatesh_dami13,
        TITLE = { Regularized nonnegative shared subspace learning },
        AUTHOR = { Gupta, S. and Phung, D. and Adams, B. and Venkatesh, S. },
        JOURNAL = { Data Mining and Knowledge Discovery },
        YEAR = { 2013 },
        MONTH = { January },
        NUMBER = { 1 },
        PAGES = { 57-97 },
        VOLUME = { 26 },
        ABSTRACT = { Joint modeling of related data sources has the potential to improve various data mining tasks such as transfer learning, multitask clustering, information retrieval etc. However, diversity among various data sources might outweigh the advantages of the joint modeling, and thus may result in performance degradations. To this end, we propose a regularized shared subspace learning framework, which can exploit the mutual strengths of related data sources while being immune to the effects of the variabilities of each source. This is achieved by further imposing a mutual orthogonality constraint on the constituent subspaces which segregates the common patterns from the source specific patterns, and thus, avoids performance degradations. Our approach is rooted in nonnegative matrix factorization and extends it further to enable joint analysis of related data sources. Experiments performed using three real world data sets for both retrieval and clustering applications demonstrate the benefits of regularization and validate the effectiveness of the model. Our proposed solution provides a formal framework appropriate for jointly analyzing related data sources and therefore, it is applicable to a wider context in data mining. },
        COMMENT = { coauthor },
        OWNER = { 14232334 },
        PUBLISHER = { Springer },
        TIMESTAMP = { 2011.09.29 },
    }
J
  • Split-Merge Augmented Gibbs Sampling for Hierarchical Dirichlet Processes
    Rana, S., Phung, D. and Venkatesh, S.. In Proc. of Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pages 546-557, Gold Coast, Queensland, Australia, April 2013. [ | ]
    The Hierarchical Dirichlet Process (HDP) model is an important tool for topic analysis. Inference can be done through a Gibbs sampler using the auxiliary variable method. We propose a split-merge procedure to augment this method of inference, facilitating faster convergence. Whilst the incremental Gibbs sampler changes topic assignments of each word conditioned on the previous observations and model hyperparameters, the split-merge sampler changes the topic assignments over a group of words in a single move. This allows efficient exploration of state space. We evaluate the proposed sampler on a synthetic test set and two benchmark document corpus and show that the proposed sampler enables the MCMC chain to converge faster to the desired stationary distribution.
    @INPROCEEDINGS { rana_phung_venkatesh_pakdd13,
        TITLE = { Split-Merge Augmented {G}ibbs Sampling for Hierarchical {D}irichlet Processes },
        AUTHOR = { Rana, S. and Phung, D. and Venkatesh, S. },
        BOOKTITLE = { Proc. of Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) },
        YEAR = { 2013 },
        ADDRESS = { Gold Coast, Queensland, Australia },
        MONTH = { April },
        PAGES = { 546-557 },
        ABSTRACT = { The Hierarchical Dirichlet Process (HDP) model is an important tool for topic analysis. Inference can be done through a Gibbs sampler using the auxiliary variable method. We propose a split-merge procedure to augment this method of inference, facilitating faster convergence. Whilst the incremental Gibbs sampler changes topic assignments of each word conditioned on the previous observations and model hyperparameters, the split-merge sampler changes the topic assignments over a group of words in a single move. This allows efficient exploration of state space. We evaluate the proposed sampler on a synthetic test set and two benchmark document corpus and show that the proposed sampler enables the MCMC chain to converge faster to the desired stationary distribution. },
        OWNER = { thinng },
        TIMESTAMP = { 2013.01.07 },
    }
C
  • Latent Patient Profile Modelling and Applications with Mixed-Variate Restricted Boltzmann Machine
    Nguyen, Tu Dinh, Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. In Advances in Knowledge Discovery and Data Mining, pages 123-135, Gold Coast, Queensland, Australia, April 2013. [ | ]
    Efficient management of chronic diseases is critical in modern health care. We consider diabetes mellitus, and our ongoing goal is to examine how machine learning can deliver information for clinical efficiency. The challenge is to aggregate highly heterogeneous sources including demographics, diagnoses, pathologies and treatments, and extract similar groups so that care plans can be designed. To this end, we propose a recent advance, mixed-variate restricted Boltzmann machine (MV.RBM) as it seamlessly integrates multiple data types for each patient aggregated over time and outputs a homogeneous representation called “latent profile” that can be used for patient clustering, visualisation, disease correlation analysis and prediction. We demonstrate that the method outperforms all baselines on these tasks - the primary characteristics of patients in the same groups are able to be identified and the good result can be achieved for the diagnosis codes prediction.
    @INPROCEEDINGS { tu_truyen_phung_venkatesh_pakdd13,
        TITLE = { Latent Patient Profile Modelling and Applications with Mixed-Variate Restricted {B}oltzmann Machine },
        AUTHOR = { Nguyen, Tu Dinh and Tran, Truyen and Phung, Dinh and Venkatesh, Svetha },
        BOOKTITLE = { Advances in Knowledge Discovery and Data Mining },
        YEAR = { 2013 },
        ADDRESS = { Gold Coast, Queensland, Australia },
        MONTH = { April },
        NUMBER = { 978-3-642-37452-4 },
        PAGES = { 123--135 },
        PUBLISHER = { Springer-Verlag Berlin Heidelberg },
        VOLUME = { 7818 },
        ABSTRACT = { Efficient management of chronic diseases is critical in modern health care. We consider diabetes mellitus, and our ongoing goal is to examine how machine learning can deliver information for clinical efficiency. The challenge is to aggregate highly heterogeneous sources including demographics, diagnoses, pathologies and treatments, and extract similar groups so that care plans can be designed. To this end, we propose a recent advance, mixed-variate restricted Boltzmann machine (MV.RBM) as it seamlessly integrates multiple data types for each patient aggregated over time and outputs a homogeneous representation called “latent profile” that can be used for patient clustering, visualisation, disease correlation analysis and prediction. We demonstrate that the method outperforms all baselines on these tasks - the primary characteristics of patients in the same groups are able to be identified and the good result can be achieved for the diagnosis codes prediction. },
        OWNER = { tund },
        TIMESTAMP = { 2013.01.07 },
    }
C
  • Clustering Patient Medical Records via Sparse Subspace Representation
    Saha, B., Phung, D., Pham, D. and Venkatesh, S.. In Proc. of Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pages 123-134, Gold Coast, Queensland, Australia, April 2013. [ | ]
    The health industry is facing increasing challenge with "big data" as traditional methods fail to manage the scale and complexity. This paper examines clustering of patient records for chronic diseases to facilitate a better construction of care plans. We solve this problem under the framework of subspace clustering. Our novel contribution lies in the exploitation of sparse representation to discover subspaces automatically and a domain-specifc construction of weighting matrices for patient records. We show the new formulation is readily solved by extending existing `1-regularized optimization algorithms. Using a cohort of both diabetes and stroke data we show that we outperform existing benchmark clustering techniques in the literature.
    @INPROCEEDINGS { saha_phung_pham_venkatesh_pakdd13,
        TITLE = { Clustering Patient Medical Records via Sparse Subspace Representation },
        AUTHOR = { Saha, B. and Phung, D. and Pham, D. and Venkatesh, S. },
        BOOKTITLE = { Proc. of Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) },
        YEAR = { 2013 },
        ADDRESS = { Gold Coast, Queensland, Australia },
        MONTH = { April },
        PAGES = { 123-134 },
        ABSTRACT = { The health industry is facing increasing challenge with "big data" as traditional methods fail to manage the scale and complexity. This paper examines clustering of patient records for chronic diseases to facilitate a better construction of care plans. We solve this problem under the framework of subspace clustering. Our novel contribution lies in the exploitation of sparse representation to discover subspaces automatically and a domain-specifc construction of weighting matrices for patient records. We show the new formulation is readily solved by extending existing `1-regularized optimization algorithms. Using a cohort of both diabetes and stroke data we show that we outperform existing benchmark clustering techniques in the literature. },
        OWNER = { thinng },
        TIMESTAMP = { 2013.01.07 },
    }
C
  • Learning sparse latent representation and distance metric for image retrieval
    Nguyen, Tu Dinh, Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. In IEEE International Conference on Multimedia and Expo (ICME), pages 1-6, 2013. [ | ]
    The performance of image retrieval depends critically on the semantic representation and the distance function used to estimate the similarity of two images. A good representation should integrate multiple visual and textual (e.g., tag) features and offer a step closer to the true semantics of interest (e.g., concepts). As the distance function operates on the representation, they are interdependent, and thus should be addressed at the same time. We propose a probabilistic solution to learn both the representation from multiple feature types and modalities and the distance metric from data. The learning is regularised so that the learned representation and information-theoretic metric will (i) preserve the regularities of the visual/textual spaces, (ii) enhance structured sparsity, (iii) encourage small intra-concept distances, and (iv) keep inter-concept images separated. We demonstrate the capacity of our method on the NUS-WIDE data. For the well-studied 13 animal subset, our method outperforms state-of-the-art rivals. On the subset of single-concept images, we gain 79:5% improvement over the standard nearest neighbours approach on the MAP score, and 45.7% on the NDCG.
    @INPROCEEDINGS { tu_truyen_phung_venkatesh_icme13,
        TITLE = { Learning sparse latent representation and distance metric for image retrieval },
        AUTHOR = { Nguyen, Tu Dinh and Tran, Truyen and Phung, Dinh and Venkatesh, Svetha },
        BOOKTITLE = { IEEE International Conference on Multimedia and Expo (ICME) },
        YEAR = { 2013 },
        PAGES = { 1-6 },
        ABSTRACT = { The performance of image retrieval depends critically on the semantic representation and the distance function used to estimate the similarity of two images. A good representation should integrate multiple visual and textual (e.g., tag) features and offer a step closer to the true semantics of interest (e.g., concepts). As the distance function operates on the representation, they are interdependent, and thus should be addressed at the same time. We propose a probabilistic solution to learn both the representation from multiple feature types and modalities and the distance metric from data. The learning is regularised so that the learned representation and information-theoretic metric will (i) preserve the regularities of the visual/textual spaces, (ii) enhance structured sparsity, (iii) encourage small intra-concept distances, and (iv) keep inter-concept images separated. We demonstrate the capacity of our method on the NUS-WIDE data. For the well-studied 13 animal subset, our method outperforms state-of-the-art rivals. On the subset of single-concept images, we gain 79:5% improvement over the standard nearest neighbours approach on the MAP score, and 45.7% on the NDCG. },
        DOI = { 10.1109/ICME.2013.6607435 },
        ISSN = { 1945-7871 },
        KEYWORDS = { Abstracts;Australia;Rabbits;Whales;Image retrieval;Mixed-Variate;NUS-WIDE;Restricted Boltzmann Machines;metric learning;sparsity },
        OWNER = { tund },
        TIMESTAMP = { 2013.10.15 },
    }
C
  • Exploiting Side Information in Distance Dependent Chinese Restaurant Processes for Data Clustering
    Li, C., Phung, D., Rana, S. and Venkatesh, S.. In The 2013 IEEE International Conference on Multimedia and Expo (ICME 2013), San Jose, California, USA, July, 2013 2013. [ | | ]
    Multimedia contents often possess weakly annotated data such as tags, links and interactions. The weakly annotated data is called side information. It is the auxiliary information of data and provides hints for exploring the link structure of data. Most clustering algorithms utilize pure data for clustering. A model that combines pure data and side information, such as images and tags, documents and keywords, can perform better at understanding the underlying structure of data. We demonstrate how to incorporate different types of side information into a recently proposed Bayesian non-parametric model, the distance dependent Chinese restaurant process (DD-CRP). Our algorithm embeds the affinity of this information into the decay function of the DD-CRP when side information is in the form of subsets of discrete labels. It is flexible to measure distance based on arbitrary side information instead of only the spatial layout or time stamp of observations. At the same time, for noisy and incomplete side information, we set the decay function so that the DD-CRP reduces to the traditional Chinese restaurant process, thus not inducing side effects of noisy and incomplete side information. Experimental evaluations on two real-world datasets NUS_WIDE and 20 Newsgroups show exploiting side information in DD-CRP significantly improves the clustering performance.
    @INPROCEEDINGS { li_phung_rana_venkatesh_icme13,
        TITLE = { Exploiting Side Information in Distance Dependent Chinese Restaurant Processes for Data Clustering },
        AUTHOR = { Li, C. and Phung, D. and Rana, S. and Venkatesh, S. },
        BOOKTITLE = { The 2013 IEEE International Conference on Multimedia and Expo (ICME 2013) },
        YEAR = { 2013 },
        ADDRESS = { San Jose, California, USA },
        MONTH = { July, 2013 },
        ABSTRACT = { Multimedia contents often possess weakly annotated data such as tags, links and interactions. The weakly annotated data is called side information. It is the auxiliary information of data and provides hints for exploring the link structure of data. Most clustering algorithms utilize pure data for clustering. A model that combines pure data and side information, such as images and tags, documents and keywords, can perform better at understanding the underlying structure of data. We demonstrate how to incorporate different types of side information into a recently proposed Bayesian non-parametric model, the distance dependent Chinese restaurant process (DD-CRP). Our algorithm embeds the affinity of this information into the decay function of the DD-CRP when side information is in the form of subsets of discrete labels. It is flexible to measure distance based on arbitrary side information instead of only the spatial layout or time stamp of observations. At the same time, for noisy and incomplete side information, we set the decay function so that the DD-CRP reduces to the traditional Chinese restaurant process, thus not inducing side effects of noisy and incomplete side information. Experimental evaluations on two real-world datasets NUS_WIDE and 20 Newsgroups show exploiting side information in DD-CRP significantly improves the clustering performance. },
        OWNER = { thinng },
        TIMESTAMP = { 2013.04.12 },
        URL = { 2013/coference/Cheng_etal_ICME2013.pdf },
    }
C
  • Online Social Capital: Mood, Topical and Psycholinguistic Analysis
    Nguyen, T., Dao, B., Phung, D., Venkatesh, S. and Berk, M.. In AAAI Int. Conf on Weblogs and Social Media (ICWSM), pages 449-456, Boston, USA, July 2013. [ | | pdf]
    Social media provides rich sources of personal information and community interaction which can be linked to aspect of mental health. In this paper we investigate manifest properties of textual messages, including latent topics, psycholinguistic features, and authors' mood, of a large corpus of blog posts, to analyze the aspect of social capital in social media communities. Using data collected from Live Journal, we find that bloggers with lower social capital have fewer positive moods and more negative moods than those with higher social capital. It is also found that people with low social capital have more random mood swings over time than the people with high social capital. Significant differences are found between low and high social capital groups when characterized by a set of latent topics and psycholinguistic features derived from blogposts, suggesting discriminative features, proved to be useful for classification tasks. Good prediction is achieved when classifying among social capital groups using topic and linguistic features, with linguistic features are found to have greater predictive power than latent topics. The significance of our work lies in the importance of online social capital to potential construction of automatic healthcare monitoring systems. We further establish the link between mood and social capital in online communities, suggesting the foundation of new systems to monitor online mental well-being.
    @INPROCEEDINGS { nguyen_dao_phung_venkatesh_berk_icwsm13,
        TITLE = { Online Social Capital: Mood, Topical and Psycholinguistic Analysis },
        AUTHOR = { Nguyen, T. and Dao, B. and Phung, D. and Venkatesh, S. and Berk, M. },
        BOOKTITLE = { AAAI Int. Conf on Weblogs and Social Media (ICWSM) },
        YEAR = { 2013 },
        ADDRESS = { Boston, USA },
        MONTH = { July },
        PAGES = { 449-456 },
        ABSTRACT = { Social media provides rich sources of personal information and community interaction which can be linked to aspect of mental health. In this paper we investigate manifest properties of textual messages, including latent topics, psycholinguistic features, and authors' mood, of a large corpus of blog posts, to analyze the aspect of social capital in social media communities. Using data collected from Live Journal, we find that bloggers with lower social capital have fewer positive moods and more negative moods than those with higher social capital. It is also found that people with low social capital have more random mood swings over time than the people with high social capital. Significant differences are found between low and high social capital groups when characterized by a set of latent topics and psycholinguistic features derived from blogposts, suggesting discriminative features, proved to be useful for classification tasks. Good prediction is achieved when classifying among social capital groups using topic and linguistic features, with linguistic features are found to have greater predictive power than latent topics. The significance of our work lies in the importance of online social capital to potential construction of automatic healthcare monitoring systems. We further establish the link between mood and social capital in online communities, suggesting the foundation of new systems to monitor online mental well-being. },
        COMMENT = { coauthor },
        OWNER = { 184698H },
        TIMESTAMP = { 2011.06.03 },
        URL = { http://prada-research.net/~dinh/uploads/Main/Publications/nguyen_dao_phung_venkatesh_berk_icwsm13.pdf },
    }
C
  • Analysis of Psycholinguistic Processes and Topics in Online Autism Communities
    Nguyen, T., Phung, D. and Venkatesh, S.. In The IEEE International Conference on Multimedia and Expo (ICME), San Jose, USA, July 2013. [ | | pdf]
    Current growth of individuals on the autism spectrum disorder (ASD) requires continuous support and care. With the popularity of social media, online communities of people affected by ASD emerge. This paper presents an analysis of these online communities through understanding aspects that differentiate such communities. In this paper, the aspects given are not expressed in terms of friendship, exchange of information, social support or recreation, but rather with regard to the topics and linguistic styles that people express in their on-line writing. Using data collected unobtrusively from LiveJournal, we analyze posts made by ten autism communities in conjunction with those made by a control group of standard communities. Signi?cant differences have been found between autism and control communities when characterized by latent topics of discussion and psycholinguistic features. Latent topics are found to have greater predictive power than linguistic features when classifying blog posts as either autism or control community. This study suggests that data mining of online blogs has the potential to detect clinically meaningful data. It opens the door to possibilities including sentinel risk surveillance and harnessing the power in diverse large datasets.
    @INPROCEEDINGS { nguyen_phung_venkatesh_icme13,
        TITLE = { Analysis of Psycholinguistic Processes and Topics in Online Autism Communities },
        AUTHOR = { Nguyen, T. and Phung, D. and Venkatesh, S. },
        BOOKTITLE = { The IEEE International Conference on Multimedia and Expo (ICME) },
        YEAR = { 2013 },
        ADDRESS = { San Jose, USA },
        MONTH = { July },
        ABSTRACT = { Current growth of individuals on the autism spectrum disorder (ASD) requires continuous support and care. With the popularity of social media, online communities of people affected by ASD emerge. This paper presents an analysis of these online communities through understanding aspects that differentiate such communities. In this paper, the aspects given are not expressed in terms of friendship, exchange of information, social support or recreation, but rather with regard to the topics and linguistic styles that people express in their on-line writing. Using data collected unobtrusively from LiveJournal, we analyze posts made by ten autism communities in conjunction with those made by a control group of standard communities. Signi?cant differences have been found between autism and control communities when characterized by latent topics of discussion and psycholinguistic features. Latent topics are found to have greater predictive power than linguistic features when classifying blog posts as either autism or control community. This study suggests that data mining of online blogs has the potential to detect clinically meaningful data. It opens the door to possibilities including sentinel risk surveillance and harnessing the power in diverse large datasets. },
        COMMENT = { coauthor },
        OWNER = { 184698H },
        TIMESTAMP = { 2011.06.03 },
        URL = { http://prada-research.net/~dinh/uploads/Main/Publications/nguyen_phung_venkatesh_icme13.pdf },
    }
C
  • Toby Playpad: Empowering Parents to Provide Early Therapy in the Home
    Venkatesh, S., Phung, D., Greenhill, S., Duong, T. and Adams, B.. In Proceedings of the International Meeting for Autism Research (IMFAR), page (accepted), Donostia, Spain, May 2013. [ | ]
    @INPROCEEDINGS { venkatesh_phung_greenhill_duong_adams_imfar13,
        TITLE = { {Toby Playpad}: Empowering Parents to Provide Early Therapy in the Home },
        AUTHOR = { Venkatesh, S. and Phung, D. and Greenhill, S. and Duong, T. and Adams, B. },
        BOOKTITLE = { Proceedings of the International Meeting for Autism Research (IMFAR) },
        YEAR = { 2013 },
        ADDRESS = { Donostia, Spain },
        MONTH = { May },
        PAGES = { (accepted) },
        OWNER = { thinng },
        TIMESTAMP = { 2013.01.07 },
    }
C
  • Toby Playpad: Empowering Parents to Provide Early Therapy in the Home (extended abstract)
    Venkatesh, S., Duong, T., Phung, D., Greenhill, S., Adams, B., Marshall, W. and Cairns, D.. In Proc. of BioAutism Conference, Australian Society for Autism Research (ASFAR), Melbourne, Australia, February 2013. [ | ]
    @INPROCEEDINGS { venkatesh_duong_phung_greenhill_adams_marshall_cairns_bioautism13,
        TITLE = { {Toby Playpad}: Empowering Parents to Provide Early Therapy in the Home (extended abstract) },
        AUTHOR = { Venkatesh, S. and Duong, T. and Phung, D. and Greenhill, S. and Adams, B. and Marshall, W. and Cairns, D. },
        BOOKTITLE = { Proc. of BioAutism Conference, Australian Society for Autism Research (ASFAR) },
        YEAR = { 2013 },
        ADDRESS = { Melbourne, Australia },
        MONTH = { February },
        OWNER = { thinng },
        TIMESTAMP = { 2013.01.07 },
    }
C
  • Interactive Browsing System for Anomaly Video Surveillance
    Nguyen, T.V., Phung, D., Sunil, G. and Venkatesh, S.. In Proc. of IEEE Eight International Conference on Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP), pages 384 - 389, Melbourne, Australia, April 2013. [ | ]
    Existing anomaly detection methods in video surveillance exhibit lack of congruence between rare events detected by algorithms and what is considered anomalous by users. This paper introduces a novel browsing model to address this issue, allowing users to interactively examine rare events in an intuitive manner. Introducing a novel way to compute rare motion patterns, we estimate latent factors of foreground motion patterns through Bayesian Nonparametric Factor analysis. Each factor corresponds to a typical motion pattern. A rarity score for each factor is computed, and ordered in decreasing order of rarity, permitting users to browse events using any proportion of rare factors. Rare events correspond to frames that contain the rare factors chosen. We present the user with an interface to inspect events that incorporate these rarest factors in a spatial-temporal manner. We demonstrate the system on a public video data set, showing key aspects of the browsing paradigm.
    @INPROCEEDINGS { nguyen_phung_gupta_venkatesh_issnip13,
        TITLE = { Interactive Browsing System for Anomaly Video Surveillance },
        AUTHOR = { Nguyen, T.V. and Phung, D. and Sunil, G. and Venkatesh, S. },
        BOOKTITLE = { Proc. of IEEE Eight International Conference on Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP) },
        YEAR = { 2013 },
        ADDRESS = { Melbourne, Australia },
        MONTH = { April },
        PAGES = { 384 - 389 },
        ABSTRACT = { Existing anomaly detection methods in video surveillance exhibit lack of congruence between rare events detected by algorithms and what is considered anomalous by users. This paper introduces a novel browsing model to address this issue, allowing users to interactively examine rare events in an intuitive manner. Introducing a novel way to compute rare motion patterns, we estimate latent factors of foreground motion patterns through Bayesian Nonparametric Factor analysis. Each factor corresponds to a typical motion pattern. A rarity score for each factor is computed, and ordered in decreasing order of rarity, permitting users to browse events using any proportion of rare factors. Rare events correspond to frames that contain the rare factors chosen. We present the user with an interface to inspect events that incorporate these rarest factors in a spatial-temporal manner. We demonstrate the system on a public video data set, showing key aspects of the browsing paradigm. },
        OWNER = { thinng },
        TIMESTAMP = { 2013.01.07 },
    }
C
  • TOBY playpad application to teach children with ASD: A pilot trial
    Moore, D., Venkatesh, S., Anderson, A., Phung, D., Greenhill, S., Duong, T., Cairns, D., Marshall, W. and Whitehouse, A.. Developmental Neurorehabilitation, 2013. [ | ]
    Purpose: To investigate use patterns and learning outcomes associated with the use of TOBY Playpad, an early intervention iPad application. Methods: Participants were 33 families with a child with an ASD aged 16 years or less, and with a diagnosis of Autism or Pervasive Developmental Disorder – Not Otherwise Specified, and no secondary diagnoses. Families were provided with TOBY and asked to use it for four to six weeks, without further prompting or coaching. Dependent variables included participant use patterns and initial indicators of child progress. Results: Twenty-three participants engaged extensively with TOBY, being exposed to at least 100 complete learn units (CLUs) and completing between 17% and 100% of the curriculum. Conclusions: TOBY may make a useful contribution to early intervention programming for children with ASD delivering high rates of appropriate learning opportunities. Further research evaluating the efficacy of TOBY in relation to independent indicators of functioning is warranted.
    @ARTICLE { Moore_Venkatesh_Anderson_Greenhill_Phung_Duong_Cairns_Marshall_Whitehouse_DevNeu13,
        TITLE = { {TOBY} playpad application to teach children with ASD: A pilot trial },
        AUTHOR = { Moore, D. and Venkatesh, S. and Anderson, A. and Phung, D. and Greenhill, S. and Duong, T. and Cairns, D. and Marshall, W. and Whitehouse, A. },
        JOURNAL = { Developmental Neurorehabilitation },
        YEAR = { 2013 },
        PAGES = { 1-5 },
        ABSTRACT = { Purpose: To investigate use patterns and learning outcomes associated with the use of TOBY Playpad, an early intervention iPad application. Methods: Participants were 33 families with a child with an ASD aged 16 years or less, and with a diagnosis of Autism or Pervasive Developmental Disorder – Not Otherwise Specified, and no secondary diagnoses. Families were provided with TOBY and asked to use it for four to six weeks, without further prompting or coaching. Dependent variables included participant use patterns and initial indicators of child progress. Results: Twenty-three participants engaged extensively with TOBY, being exposed to at least 100 complete learn units (CLUs) and completing between 17% and 100% of the curriculum. Conclusions: TOBY may make a useful contribution to early intervention programming for children with ASD delivering high rates of appropriate learning opportunities. Further research evaluating the efficacy of TOBY in relation to independent indicators of functioning is warranted. },
        OWNER = { thinng },
        TIMESTAMP = { 2013.04.15 },
    }
J
  • TOBY: Therapy Outcome By You
    Venkatesh, S., Greenhill, S., Phung, D., Duong, T., Adams, B., Marshall, W. and Cairns, D.. In Proceedings of the Annual Autism Conference, Portland, USA, January 2013. [ | ]
    Early intervention is critical for children diagnosed with autism. Unfortunately, there is often a long gap of waiting, and wasting, time between a “formal” diagnosis and therapy. We describe TOBY Playpad (www.tobyplaypad.com) whose goal is to close this gap by empowering parents to help their children early. TOBY stands for Therapy Outcome by You and currently is an iPad application. It provides an adaptive syllabus of more than 320 activities developed by autism and machine learning experts to target key development areas which are known to be deficit for ASD children such as imitation, joint attention and language. TOBY delivers lessons, materials, instructions and interactions for both on-iPad and Natural Environment Tasks (NET) off-iPad activities. TOBY is highly adaptive and personalized, intelligently increasing its complexity, varying prompts and reinforcements as the child progresses over time. Prompting and reinforcing strategies are also recommended for parents to make the most of everyday opportunities to teach children. Essentially, TOBY removes the burden on parents from extensive preparation of materials and manual data recording. Three trials on 20, 50 and 36 children with AutismWest (www.autismwest.org.au) have been conducted since last year. The results are promising providing evidence of learning shown in skills that were not present previously in some children. NET activities are shown to be effective for children and popular with parents.
    @INPROCEEDINGS { venkatesh_phung_greenhill_duong_adams_marshall_cairns_abai13,
        TITLE = { {TOBY}: Therapy Outcome By You },
        AUTHOR = { Venkatesh, S. and Greenhill, S. and Phung, D. and Duong, T. and Adams, B. and Marshall, W. and Cairns, D. },
        BOOKTITLE = { Proceedings of the Annual Autism Conference },
        YEAR = { 2013 },
        ADDRESS = { Portland, USA },
        MONTH = { January },
        ABSTRACT = { Early intervention is critical for children diagnosed with autism. Unfortunately, there is often a long gap of waiting, and wasting, time between a “formal” diagnosis and therapy. We describe TOBY Playpad (www.tobyplaypad.com) whose goal is to close this gap by empowering parents to help their children early. TOBY stands for Therapy Outcome by You and currently is an iPad application. It provides an adaptive syllabus of more than 320 activities developed by autism and machine learning experts to target key development areas which are known to be deficit for ASD children such as imitation, joint attention and language. TOBY delivers lessons, materials, instructions and interactions for both on-iPad and Natural Environment Tasks (NET) off-iPad activities. TOBY is highly adaptive and personalized, intelligently increasing its complexity, varying prompts and reinforcements as the child progresses over time. Prompting and reinforcing strategies are also recommended for parents to make the most of everyday opportunities to teach children. Essentially, TOBY removes the burden on parents from extensive preparation of materials and manual data recording. Three trials on 20, 50 and 36 children with AutismWest (www.autismwest.org.au) have been conducted since last year. The results are promising providing evidence of learning shown in skills that were not present previously in some children. NET activities are shown to be effective for children and popular with parents. },
        OWNER = { thinng },
        TIMESTAMP = { 2013.01.07 },
    }
C
2012
  • Conditionally Dependent Dirichlet Processes for Modelling Naturally Correlated Data Sources
    Phung, D., Nguyen, X., Bui, H., Nguyen, T.V. and Venkatesh, S.. Technical report, Pattern Recognition and Data Analytics, Deakin University, 2012. [ | | pdf]
    Abstract We introduce a new class of conditionally dependent Dirichlet processes (CDP) for hierarchical mixture modelling of naturally correlated data sources. This class of models provides a Bayesian nonparametric approach for modelling a range of challenging datasets which typically consists of heterogeneous observations from multiple correlated data channels. Some typical examples include annotated social media, networks in community where information about friendship and relation coexist with user's profiles, medical records where patient's information exists in several dimension (demograhic information, medical history, drug uses and so on). The proposed framework can easily be tailored to model multiple data sources which are correlated by some latent underlying processes, whereas most of existing topic models, notably hierarchical Dirichlet processes (HDP), is designed for only a single data observation channel. In these existing approaches, data are grouped into documents (e.g., text documents or they are grouped according to some covariates such as time or location). Our approach is different: we view context as distributions over some index space and model both topics and contexts jointly. Distributions over topic parameters are modelled according to the usual Dirichlet processes. Stick-breaking representation gives rise to explicit realizations of topic atoms which we use as an indexing machemism to induce conditional random mixture distributions on the context observation spaces -- loosely speaking, we use a stochastic process, being DP, to conditionally `index' other stochastic processes. The later can be designed on any suitable family of stochastic processes to suit modelling needs or data types of contexts (such as Beta or Gaussian processes). Dirichlet process is of course an obvious choice. Our model can be viewed as an integration of the hierarchical Dirichlet process (HDP) and the recent nested Dirichlet process (nDP) with shared mixture components. In fact, it provides an interesting interpretation whereas, under a suitable parameterization, integrating out the topic components results in a nested DP, whereas interating out the context compoenents results in a hierarchical DP. Different approaches for posterior inference exist. This paper focus on the development of an auxilliary conditional Gibbs sampling in which both topic and context atoms are marginalized out. We demonstrate the framework on synthesis datasets for temporal topic modelling and trajectory discovery in videos surveillances. We then demonstrate an application on a current visual category classification challenge in computer vision for which we significantly outperform the current reported state-of-the-art results. Finally, it is worthwide to note that our proposed approach can be easily twisted to accomodate different forms of supervision (weakly annotated data, semi-supervision) and to perform prediction.
    @TECHREPORT { phung_nguyen_bui_nguyen_venkatesh_tr12,
        TITLE = { Conditionally Dependent {D}irichlet Processes for Modelling Naturally Correlated Data Sources },
        AUTHOR = { Phung, D. and Nguyen, X. and Bui, H. and Nguyen, T.V. and Venkatesh, S. },
        INSTITUTION = { Pattern Recognition and Data Analytics, Deakin University },
        YEAR = { 2012 },
        ABSTRACT = { Abstract We introduce a new class of conditionally dependent Dirichlet processes (CDP) for hierarchical mixture modelling of naturally correlated data sources. This class of models provides a Bayesian nonparametric approach for modelling a range of challenging datasets which typically consists of heterogeneous observations from multiple correlated data channels. Some typical examples include annotated social media, networks in community where information about friendship and relation coexist with user's profiles, medical records where patient's information exists in several dimension (demograhic information, medical history, drug uses and so on). The proposed framework can easily be tailored to model multiple data sources which are correlated by some latent underlying processes, whereas most of existing topic models, notably hierarchical Dirichlet processes (HDP), is designed for only a single data observation channel. In these existing approaches, data are grouped into documents (e.g., text documents or they are grouped according to some covariates such as time or location). Our approach is different: we view context as distributions over some index space and model both topics and contexts jointly. Distributions over topic parameters are modelled according to the usual Dirichlet processes. Stick-breaking representation gives rise to explicit realizations of topic atoms which we use as an indexing machemism to induce conditional random mixture distributions on the context observation spaces -- loosely speaking, we use a stochastic process, being DP, to conditionally `index' other stochastic processes. The later can be designed on any suitable family of stochastic processes to suit modelling needs or data types of contexts (such as Beta or Gaussian processes). Dirichlet process is of course an obvious choice. Our model can be viewed as an integration of the hierarchical Dirichlet process (HDP) and the recent nested Dirichlet process (nDP) with shared mixture components. In fact, it provides an interesting interpretation whereas, under a suitable parameterization, integrating out the topic components results in a nested DP, whereas interating out the context compoenents results in a hierarchical DP. Different approaches for posterior inference exist. This paper focus on the development of an auxilliary conditional Gibbs sampling in which both topic and context atoms are marginalized out. We demonstrate the framework on synthesis datasets for temporal topic modelling and trajectory discovery in videos surveillances. We then demonstrate an application on a current visual category classification challenge in computer vision for which we significantly outperform the current reported state-of-the-art results. Finally, it is worthwide to note that our proposed approach can be easily twisted to accomodate different forms of supervision (weakly annotated data, semi-supervision) and to perform prediction. },
        OWNER = { phung },
        TIMESTAMP = { 2012.10.31 },
        URL = { http://prada-research.net/~dinh/uploads/Main/Publications/phung_etal_tr12.pdf },
    }
R
  • A Slice Sampler for Restricted Hierarchical Beta Process with Applications to Shared Subspace Learning
    Gupta, S., Phung, D. and Venkatesh, S.. In International Conference on Uncertainty in Artificial Intelligence (UAI), Catalina Island, USA, August 2012. [ | ]
    @INPROCEEDINGS { gupta_phung_venkatesh_uai12,
        TITLE = { A Slice Sampler for Restricted Hierarchical {B}eta Process with Applications to Shared Subspace Learning },
        AUTHOR = { Gupta, S. and Phung, D. and Venkatesh, S. },
        BOOKTITLE = { International Conference on Uncertainty in Artificial Intelligence (UAI) },
        YEAR = { 2012 },
        ADDRESS = { Catalina Island, USA },
        MONTH = { August },
        OWNER = { dinh },
        TIMESTAMP = { 2012.05.24 },
    }
C
  • A Bayesian Nonparametric Joint Factor Model for Learning Shared and Individual Subspaces from Multiple Data Sources
    Gupta, S., Phung, D. and Venkatesh, S.. In Proc. of SIAM Int. Conference on Data Mining (SDM), Anaheim, California, USA, April 2012. [ | ]
    Joint analysis of multiple data sources is becoming increasingly popular in transfer learning, multi-task learning and cross-domain data mining. One promising approach to model the data jointly is through learning the shared and individual factor subspaces. However, performance of this approach depends on the subspace dimensionalities and the level of sharing needs to be specified a priori. To this end, we propose a nonparametric joint factor analysis framework for modeling multiple related data sources. Our model utilizes the hierarchical beta process as a nonparametric prior to automatically infer the number of shared and individual factors. For posterior inference, we provide a Gibbs sampling scheme using auxiliary variables. The effectiveness of the proposed framework is validated through its application on two real world problems -- transfer learning in text and image retrieval.
    @INPROCEEDINGS { gupta_phung_venkatesh_sdm12,
        TITLE = { A {B}ayesian Nonparametric Joint Factor Model for Learning Shared and Individual Subspaces from Multiple Data Sources },
        AUTHOR = { Gupta, S. and Phung, D. and Venkatesh, S. },
        BOOKTITLE = { Proc. of SIAM Int. Conference on Data Mining (SDM) },
        YEAR = { 2012 },
        ADDRESS = { Anaheim, California, USA },
        MONTH = { April },
        ABSTRACT = { Joint analysis of multiple data sources is becoming increasingly popular in transfer learning, multi-task learning and cross-domain data mining. One promising approach to model the data jointly is through learning the shared and individual factor subspaces. However, performance of this approach depends on the subspace dimensionalities and the level of sharing needs to be specified a priori. To this end, we propose a nonparametric joint factor analysis framework for modeling multiple related data sources. Our model utilizes the hierarchical beta process as a nonparametric prior to automatically infer the number of shared and individual factors. For posterior inference, we provide a Gibbs sampling scheme using auxiliary variables. The effectiveness of the proposed framework is validated through its application on two real world problems -- transfer learning in text and image retrieval. },
    }
C
  • A Sequential Decision Approach to Ordinal Preferences in Recommender Systems
    Truyen, T., Phung, D. and Venkatesh, S.. In Proceedings of AAAI Conf. on Artificial Intelligence (AAAI), Toronto, Canada, July 2012. [ | ]
    We propose a novel sequential decision approach to modeling ordinal ratings in collaborative filtering problems. The rating process is assumed to start from the lowest level, evaluates against the latent utility at the corresponding level and moves up until a suitable ordinal level is found. Crucial to this generative process is the underlying utility random variables that govern the generation of ratings and their modelling choices. To this end, we make a novel use of the generalised extreme value distributions, which is found to be particularly suitable for our modeling tasks and at the same time, facilitate our inference and learning procedure. The proposed approach is flexible to incorporate features from both the user and the item. We evaluate the proposed framework on three well-known datasets: MovieLens, Dating Agency and Netflix. In all cases, it is demonstrated that the proposed work is competitive against state-of-the-art collaborative filtering methods.
    @INPROCEEDINGS { truyen_phung_venkatesh_aaai12,
        TITLE = { A Sequential Decision Approach to Ordinal Preferences in Recommender Systems },
        AUTHOR = { Truyen, T. and Phung, D. and Venkatesh, S. },
        BOOKTITLE = { Proceedings of AAAI Conf. on Artificial Intelligence (AAAI) },
        YEAR = { 2012 },
        ADDRESS = { Toronto, Canada },
        MONTH = { July },
        ABSTRACT = { We propose a novel sequential decision approach to modeling ordinal ratings in collaborative filtering problems. The rating process is assumed to start from the lowest level, evaluates against the latent utility at the corresponding level and moves up until a suitable ordinal level is found. Crucial to this generative process is the underlying utility random variables that govern the generation of ratings and their modelling choices. To this end, we make a novel use of the generalised extreme value distributions, which is found to be particularly suitable for our modeling tasks and at the same time, facilitate our inference and learning procedure. The proposed approach is flexible to incorporate features from both the user and the item. We evaluate the proposed framework on three well-known datasets: MovieLens, Dating Agency and Netflix. In all cases, it is demonstrated that the proposed work is competitive against state-of-the-art collaborative filtering methods. },
        TIMESTAMP = { 2012.04.11 },
    }
C
  • Improved Subspace Clustering via Exploitation of Spatial Constraints
    Pham, S., Budhaditya, S., Phung, D. and Venkatesh, S.. In Proceedings of IEEE Int. Conf. on Computer Vision and Pattern Recognition (CVPR), Rhode Island, USA, June 2012. [ | ]
    We present a novel approach to improving subspace clustering by exploiting the spatial constraints. The new method encourages the sparse solution to be consistent with the spatial geometry of the tracked points, by embedding weights into the sparse formulation. By doing so, we are able to correct sparse representations in a principled manner without introducing much additional computational cost. We discuss alternative ways to treat the missing and corrupted data using the latest theory in robust lasso regression and suggest numerical algorithms so solve the proposed formulation. The experiments on the benchmark Johns Hopkins 155 dataset demonstrate that exploiting spatial constraints significantly improves motion segmentation.
    @INPROCEEDINGS { pham_budhaditya_phung_venkatesh_cvpr12,
        TITLE = { Improved Subspace Clustering via Exploitation of Spatial Constraints },
        AUTHOR = { Pham, S. and Budhaditya, S. and Phung, D. and Venkatesh, S. },
        BOOKTITLE = { Proceedings of IEEE Int. Conf. on Computer Vision and Pattern Recognition (CVPR) },
        YEAR = { 2012 },
        ADDRESS = { Rhode Island, USA },
        MONTH = { June },
        ABSTRACT = { We present a novel approach to improving subspace clustering by exploiting the spatial constraints. The new method encourages the sparse solution to be consistent with the spatial geometry of the tracked points, by embedding weights into the sparse formulation. By doing so, we are able to correct sparse representations in a principled manner without introducing much additional computational cost. We discuss alternative ways to treat the missing and corrupted data using the latest theory in robust lasso regression and suggest numerical algorithms so solve the proposed formulation. The experiments on the benchmark Johns Hopkins 155 dataset demonstrate that exploiting spatial constraints significantly improves motion segmentation. },
        OWNER = { thinng },
        TIMESTAMP = { 2012.04.11 },
    }
C
  • Sparse Subspace Representation for Spectral Document Clustering
    Saha, B., Phung, D., Pham, D.S. and Venkatesh, S.. In IEEE International Conference on Data Mining (ICDM), Brussels, Belgium, December 2012. [ | ]
    @INPROCEEDINGS { saha_phung_pham_venkatesh_icdm12,
        TITLE = { Sparse Subspace Representation for Spectral Document Clustering },
        AUTHOR = { Saha, B. and Phung, D. and Pham, D.S. and Venkatesh, S. },
        BOOKTITLE = { IEEE International Conference on Data Mining (ICDM) },
        YEAR = { 2012 },
        ADDRESS = { Brussels, Belgium },
        MONTH = { December },
        OWNER = { dinh },
        TIMESTAMP = { 2012.10.31 },
    }
C
  • A Sentiment-Aware Approach to Community Formation in Social Media
    Nguyen, T., Phung, D., Adams, B. and Venkatesh, S.. In AAAI Int. Conf on Weblogs and Social Media (ICWSM), Dublin, Ireland, June 2012. [ | ]
    @INPROCEEDINGS { nguyen_phung_adams_venkatesh_icwsm12,
        TITLE = { A Sentiment-Aware Approach to Community Formation in Social Media },
        AUTHOR = { Nguyen, T. and Phung, D. and Adams, B. and Venkatesh, S. },
        BOOKTITLE = { AAAI Int. Conf on Weblogs and Social Media (ICWSM) },
        YEAR = { 2012 },
        ADDRESS = { Dublin, Ireland },
        MONTH = { June },
        COMMENT = { coauthor },
        OWNER = { 184698H },
        TIMESTAMP = { 2011.06.03 },
    }
C
  • Cumulative restricted Boltzmann machines for ordinal matrix data analysis
    Truyen, T., Phung, D. and Venkatesh, S.. In Proceedings of Asian Conference on Machine Learning (ACML), Singapore, November 2012. [ | ]
    Ordinal data is omnipresent in almost all multiuser-generated feedback - questionnaires, preferences etc. This paper investigates modelling of ordinal data with Gaussian restricted Boltzmann machines (RBMs). In particular, we present the model architecture, learning and inference procedures for both vector-variate and matrix-variate ordinal data. We show that our model is able to capture latent opinion prole of citizens around the world, and is competitive against state-of-art collaborative ltering techniques on large-scale public datasets. The model thus has the potential to extend application of RBMs to diverse domains such as recommendation systems, product reviews and expert assessments.
    @INPROCEEDINGS { truyen_phung_venkatesh_acml12a,
        TITLE = { Cumulative restricted {B}oltzmann machines for ordinal matrix data analysis },
        AUTHOR = { Truyen, T. and Phung, D. and Venkatesh, S. },
        BOOKTITLE = { Proceedings of Asian Conference on Machine Learning (ACML) },
        YEAR = { 2012 },
        ADDRESS = { Singapore },
        MONTH = { November },
        ABSTRACT = { Ordinal data is omnipresent in almost all multiuser-generated feedback - questionnaires, preferences etc. This paper investigates modelling of ordinal data with Gaussian restricted Boltzmann machines (RBMs). In particular, we present the model architecture, learning and inference procedures for both vector-variate and matrix-variate ordinal data. We show that our model is able to capture latent opinion prole of citizens around the world, and is competitive against state-of-art collaborative ltering techniques on large-scale public datasets. The model thus has the potential to extend application of RBMs to diverse domains such as recommendation systems, product reviews and expert assessments. },
    }
C
  • Learning From Ordered Sets and Applications in Collaborative Ranking
    Truyen, T., Phung, D and Venkatesh, S.. In Proceedings of Asian Conference on Machine Learning (ACML), Singapore, November 2012. [ | ]
    Ranking over sets arise when users choose between groups of items. For example, a group may be of those movies deemed 5 stars to them, or a customized tour package. It turns out, to model this data type properly, we need to investigate the general combinatorics problem of partitioning a set and ordering the subsets. Here we construct a probabilistic log-linear model over a set of ordered subsets. Inference in this combinatorial space is highly challenging: The space size approaches (N!=2)6:93145N+1 as N approaches innity. We propose a split-and-merge Metropolis-Hastings procedure that can explore the statespace eciently. For discovering hidden aspects in the data, we enrich the model with latent binary variables so that the posteriors can be eciently evaluated. Finally, we evaluate the proposed model on large-scale collaborative ltering tasks and demonstrate that it is competitive against state-of-the-art methods.
    @INPROCEEDINGS { truyen_phung_venkatesh_acml12b,
        TITLE = { Learning From Ordered Sets and Applications in Collaborative Ranking },
        AUTHOR = { Truyen, T. and Phung, D and Venkatesh, S. },
        BOOKTITLE = { Proceedings of Asian Conference on Machine Learning (ACML) },
        YEAR = { 2012 },
        ADDRESS = { Singapore },
        MONTH = { November },
        ABSTRACT = { Ranking over sets arise when users choose between groups of items. For example, a group may be of those movies deemed 5 stars to them, or a customized tour package. It turns out, to model this data type properly, we need to investigate the general combinatorics problem of partitioning a set and ordering the subsets. Here we construct a probabilistic log-linear model over a set of ordered subsets. Inference in this combinatorial space is highly challenging: The space size approaches (N!=2)6:93145N+1 as N approaches innity. We propose a split-and-merge Metropolis-Hastings procedure that can explore the statespace eciently. For discovering hidden aspects in the data, we enrich the model with latent binary variables so that the posteriors can be eciently evaluated. Finally, we evaluate the proposed model on large-scale collaborative ltering tasks and demonstrate that it is competitive against state-of-the-art methods. },
        OWNER = { thinng },
        TIMESTAMP = { 2013.04.12 },
    }
C
  • Emotional Reactions to Real-World Events in Social Networks
    Nguyen, T., Phung, D., Adams, B. and Venkatesh, S.. In New Frontiers in Applied Data Mining, pages 53-64.Springer, , 2012. [ | | pdf]
    A convergence of emotions among people in social networks is potentially resulted by the occurrence of an unprecedented event in real world. E.g., a majority of bloggers would react angrily at the September 11 terrorist attacks. Based on this observation, we introduce a sentiment index, computed from the current mood tags in a collection of blog posts utilizing an affective lexicon, potentially revealing subtle events discussed in the blogosphere. We then develop a method for extracting events based on this index and its distribution. Our second contribution is establishment of a new bursty structure in text streams termed a sentiment burst. We employ a stochastic model to detect bursty periods of moods and the events associated. Our results on a dataset of more than 12 million mood-tagged blog posts over a 4-year period have shown that our sentiment-based bursty events are indeed meaningful, in several ways.
    @INCOLLECTION { nguyen_phung_adams_venkatesh_lncs12,
        TITLE = { Emotional Reactions to Real-World Events in Social Networks },
        AUTHOR = { Nguyen, T. and Phung, D. and Adams, B. and Venkatesh, S. },
        BOOKTITLE = { New Frontiers in Applied Data Mining },
        PUBLISHER = { Springer },
        YEAR = { 2012 },
        EDITOR = { Cao, Longbing and Huang, Joshua and Bailey, James and Koh, Yun and Luo, Jun },
        PAGES = { 53--64 },
        ABSTRACT = { A convergence of emotions among people in social networks is potentially resulted by the occurrence of an unprecedented event in real world. E.g., a majority of bloggers would react angrily at the September 11 terrorist attacks. Based on this observation, we introduce a sentiment index, computed from the current mood tags in a collection of blog posts utilizing an affective lexicon, potentially revealing subtle events discussed in the blogosphere. We then develop a method for extracting events based on this index and its distribution. Our second contribution is establishment of a new bursty structure in text streams termed a sentiment burst. We employ a stochastic model to detect bursty periods of moods and the events associated. Our results on a dataset of more than 12 million mood-tagged blog posts over a 4-year period have shown that our sentiment-based bursty events are indeed meaningful, in several ways. },
        OWNER = { dinh },
        TIMESTAMP = { 2012.04.05 },
        URL = { http://dx.doi.org/10.1007/978-3-642-28320-8_5 },
    }
BC
  • Pervasive multimedia for autism intervention
    Venkatesh, S., Greenhill, S., Phung, D., Adams, B. and Duong, T.. Pervasive and Mobile Computing (PMC), 8(6):863 - 882, 2012. [ | ]
    There is a growing gap between the number of children with autism requiring early intervention and available therapy. We present a portable platform for pervasive delivery of early intervention therapy using multi-touch interfaces and principled ways to deliver stimuli of increasing complexity and adapt to a child's performance. Our implementation weaves Natural Environment Tasks with iPad tasks, facilitating a learning platform that integrates early intervention in the child’s daily life. The system's construction of stimulus complexity relative to task is evaluated by therapists, together with field trials for evaluating both the integrity of the instructional design and goal of stimulus presentation and adjustment relative to performance for learning tasks. We show positive results across all our stakeholders–children, parents and therapists. Our results have implications for other early learning fields that require principled ways to construct lessons across skills and adjust stimuli relative to performance.
    @ARTICLE { venkatesh_greenhill_phung_adams_duong_pmc12,
        TITLE = { Pervasive multimedia for autism intervention },
        AUTHOR = { Venkatesh, S. and Greenhill, S. and Phung, D. and Adams, B. and Duong, T. },
        JOURNAL = { Pervasive and Mobile Computing (PMC) },
        YEAR = { 2012 },
        NUMBER = { 6 },
        PAGES = { 863 - 882 },
        VOLUME = { 8 },
        ABSTRACT = { There is a growing gap between the number of children with autism requiring early intervention and available therapy. We present a portable platform for pervasive delivery of early intervention therapy using multi-touch interfaces and principled ways to deliver stimuli of increasing complexity and adapt to a child's performance. Our implementation weaves Natural Environment Tasks with iPad tasks, facilitating a learning platform that integrates early intervention in the child’s daily life. The system's construction of stimulus complexity relative to task is evaluated by therapists, together with field trials for evaluating both the integrity of the instructional design and goal of stimulus presentation and adjustment relative to performance for learning tasks. We show positive results across all our stakeholders–children, parents and therapists. Our results have implications for other early learning fields that require principled ways to construct lessons across skills and adjust stimuli relative to performance. },
        OWNER = { dinh },
        PUBLISHER = { Elsevier },
        TIMESTAMP = { 2012.08.02 },
    }
J
  • Social Reader: Towards browsing the Social Web
    Adams, B., Phung, D. and Venkatesh, S.. Multimedia Tools and Applications, June 2012. [ | | pdf]
    We describe Social Reader, a feed-reader-plus-social-network aggregator that mines comments from social media in order to display a user’s relational neighborhood as a navigable social network. Social Reader’s network visualization enhances mutual awareness of blogger communities, facilitates their exploration and growth with a fully dragn- drop interface, and provides novel ways to filter and summarize people, groups, blogs and comments. We discuss the architecture behind the reader, highlight tasks it adds to the workflow of a typical reader, and assess their cost. We also explore the potential of mood-based features in social media applications. Mood is particularly relevant to social media, reflecting the personal nature of the medium. We explore two prototype mood-based features: colour coding the mood of recent posts according to a valence/arousal map, and a mood-based abstract of recent activity using image media. A six week study of the software involving 20 users confirmed the usefulness of the novel visual display, via a quantitative analysis of use logs, and an exit survey.
    @ARTICLE { adams_phung_venkatesh_mtap12,
        TITLE = { Social Reader: Towards browsing the Social Web },
        AUTHOR = { Adams, B. and Phung, D. and Venkatesh, S. },
        JOURNAL = { Multimedia Tools and Applications },
        YEAR = { 2012 },
        MONTH = { June },
        PAGES = { 1-40 },
        ABSTRACT = { We describe Social Reader, a feed-reader-plus-social-network aggregator that mines comments from social media in order to display a user’s relational neighborhood as a navigable social network. Social Reader’s network visualization enhances mutual awareness of blogger communities, facilitates their exploration and growth with a fully dragn- drop interface, and provides novel ways to filter and summarize people, groups, blogs and comments. We discuss the architecture behind the reader, highlight tasks it adds to the workflow of a typical reader, and assess their cost. We also explore the potential of mood-based features in social media applications. Mood is particularly relevant to social media, reflecting the personal nature of the medium. We explore two prototype mood-based features: colour coding the mood of recent posts according to a valence/arousal map, and a mood-based abstract of recent activity using image media. A six week study of the software involving 20 users confirmed the usefulness of the novel visual display, via a quantitative analysis of use logs, and an exit survey. },
        DOI = { DOI 10.1007/s11042-012-1138-5 },
        FILE = { :papers\\phung\\adams_phung_venkatesh_mtap12.pdf:PDF },
        OWNER = { dinh },
        TIMESTAMP = { 2012.05.24 },
        URL = { http://www.springerlink.com/content/3k230432w50443l0/fulltext.pdf },
    }
J
  • Event Extraction Using Behaviors of Sentiment Signals and Burst Structure in Social Media
    Nguyen, T., Phung, D., Adams, B. and Venkatesh, S.. Knowledge and Information Systems, October 2012. [ | ]
    Significant world events often cause the behavioral convergence of the expression of shared sentiment. This paper examines the use of the blogosphere as a framework to study user psychological behaviours, using their sentiment responses as a form of ‘sensor’ to infer real-world events of importance automatically. We formulate a novel temporal sentiment index function using quantitative measure of the valence value of bearing words in blog posts in which the set of affective bearing words is inspired from psychological research in emotion structure. The annual local minimum and maximum of the proposed sentiment signal function are utilized to extract significant events of the year and corresponding blog posts are further analyzed using topic modelling tools to understand their content. The paper then examines the correlation of topics discovered in relation to world news events reported by the mainstream news service provider, Cable News Network (CNN), and by using the Google search engine. Next, aiming at understanding sentiment at a finer granularity over time, we propose a stochastic burst detection model, extended from the work of Kleinberg, to work incrementally with stream data. The proposed model is then used to extract sentimental bursts occurring within a specific mood label (for example, a burst of observing ‘shocked’). The blog posts at those time indices are analyzed to extract topics and these are compared to real-world news events. Our comprehensive set of experiments conducted on a large-scale set of 12 million posts from Livejournal shows that the proposed sentiment index (SI) function coincides well with significant world events while bursts in sentiment allow us to locate finer-grain external world events.
    @ARTICLE { nguyen_phung_adams_venkatesh_kais12,
        TITLE = { Event Extraction Using Behaviors of Sentiment Signals and Burst Structure in Social Media },
        AUTHOR = { Nguyen, T. and Phung, D. and Adams, B. and Venkatesh, S. },
        JOURNAL = { Knowledge and Information Systems },
        YEAR = { 2012 },
        MONTH = { October },
        PAGES = { 1-26 },
        ABSTRACT = { Significant world events often cause the behavioral convergence of the expression of shared sentiment. This paper examines the use of the blogosphere as a framework to study user psychological behaviours, using their sentiment responses as a form of ‘sensor’ to infer real-world events of importance automatically. We formulate a novel temporal sentiment index function using quantitative measure of the valence value of bearing words in blog posts in which the set of affective bearing words is inspired from psychological research in emotion structure. The annual local minimum and maximum of the proposed sentiment signal function are utilized to extract significant events of the year and corresponding blog posts are further analyzed using topic modelling tools to understand their content. The paper then examines the correlation of topics discovered in relation to world news events reported by the mainstream news service provider, Cable News Network (CNN), and by using the Google search engine. Next, aiming at understanding sentiment at a finer granularity over time, we propose a stochastic burst detection model, extended from the work of Kleinberg, to work incrementally with stream data. The proposed model is then used to extract sentimental bursts occurring within a specific mood label (for example, a burst of observing ‘shocked’). The blog posts at those time indices are analyzed to extract topics and these are compared to real-world news events. Our comprehensive set of experiments conducted on a large-scale set of 12 million posts from Livejournal shows that the proposed sentiment index (SI) function coincides well with significant world events while bursts in sentiment allow us to locate finer-grain external world events. },
    }
J
  • Detection of Cross-Channel Anomalies
    Pham, S., Budhaditya, S., Phung, D. and Venkatesh, S.. Knowledge and Information Systems (KAIS), June 2012. [ | ]
    The data deluge has created a great challenge for data mining applications wherein the rare topics of interest are often buried in the flood of major headlines. We identify and formulate a novel problem: cross-channel anomaly detection from multiple data channels. Cross-channel anomalies are common amongst the individual channel anomalies, and are often portent of significant events. Central to this new problem is a development of theoretical foundation and methodology. Using the spectral approach, we propose a two-stage detection method: anomaly detection at a single-channel level, followed by the detection of cross-channel anomalies from the amalgamation of single channel anomalies. We also derive the extension of the proposed detection method to an online settings, which automatically adapts to changes in the data over time at low computational complexity using incremental algorithms. Our mathematical analysis shows that our method is likely to reduce the false alarm rate by establishing theoretical results on the reduction of an impurity index. We demonstrate our method in two applications: document understanding with multiple text corpora, and detection of repeated anomalies in large-scale video surveillance. The experimental results consistently demonstrate the superior performance of our method compared with related state-of-art methods, including the one-class SVM and principal component pursuit. In addition, our framework can be deployed in a decentralized manner, lending itself for large scale data stream analysis.
    @ARTICLE { pham_budhaditya_phung_venkatesh_kais12,
        TITLE = { Detection of Cross-Channel Anomalies },
        AUTHOR = { Pham, S. and Budhaditya, S. and Phung, D. and Venkatesh, S. },
        JOURNAL = { Knowledge and Information Systems (KAIS) },
        YEAR = { 2012 },
        MONTH = { June },
        PAGES = { 1-27 },
        ABSTRACT = { The data deluge has created a great challenge for data mining applications wherein the rare topics of interest are often buried in the flood of major headlines. We identify and formulate a novel problem: cross-channel anomaly detection from multiple data channels. Cross-channel anomalies are common amongst the individual channel anomalies, and are often portent of significant events. Central to this new problem is a development of theoretical foundation and methodology. Using the spectral approach, we propose a two-stage detection method: anomaly detection at a single-channel level, followed by the detection of cross-channel anomalies from the amalgamation of single channel anomalies. We also derive the extension of the proposed detection method to an online settings, which automatically adapts to changes in the data over time at low computational complexity using incremental algorithms. Our mathematical analysis shows that our method is likely to reduce the false alarm rate by establishing theoretical results on the reduction of an impurity index. We demonstrate our method in two applications: document understanding with multiple text corpora, and detection of repeated anomalies in large-scale video surveillance. The experimental results consistently demonstrate the superior performance of our method compared with related state-of-art methods, including the one-class SVM and principal component pursuit. In addition, our framework can be deployed in a decentralized manner, lending itself for large scale data stream analysis. },
    }
J
  • A Nonparametric Bayesian Poisson Gamma Model for Count Data
    Gupta, S., Phung, D. and Venkatesh, S.. In Proceedings of International Conference on Pattern Recognition (ICPR), pages 1815-1818, 2012. [ | ]
    We propose a nonparametric Bayesian, linear Poisson gamma model for modeling count data and use it for dictionary learning. A key property of this model is that it captures the parts-based representation similar to nonnegative matrix factorization. We present an auxiliary variable Gibbs sampler, which turns the intractable inference into a tractable one. Combining this inference procedure with the slice sampler, we show that our model can learn the number of factors automatically from the data. The proposed model has been demonstrated using both synthetic and real-world datasets for dictionary learning applications.
    @INPROCEEDINGS { gupta_phung_venkatesh_icpr12,
        TITLE = { A Nonparametric {B}ayesian {P}oisson {G}amma Model for Count Data },
        AUTHOR = { Gupta, S. and Phung, D. and Venkatesh, S. },
        BOOKTITLE = { Proceedings of International Conference on Pattern Recognition (ICPR) },
        YEAR = { 2012 },
        PAGES = { 1815-1818 },
        ABSTRACT = { We propose a nonparametric Bayesian, linear Poisson gamma model for modeling count data and use it for dictionary learning. A key property of this model is that it captures the parts-based representation similar to nonnegative matrix factorization. We present an auxiliary variable Gibbs sampler, which turns the intractable inference into a tractable one. Combining this inference procedure with the slice sampler, we show that our model can learn the number of factors automatically from the data. The proposed model has been demonstrated using both synthetic and real-world datasets for dictionary learning applications. },
        OWNER = { dinh },
        TIMESTAMP = { 2012.06.26 },
    }
C
  • Multi-modal Abnormality Detection in Video with Unknown Data Segmentation
    Nguyen, Tien Vu, Phung, Dinh, Rana, Santu, Pham, Duc Son and Venkatesh, Svetha. In Pattern Recognition (ICPR), 2012 21st International Conference on, pages 1322-1325, Tsukuba, Japan. IEEE, November 2012. [ | ]
    This paper examines a new problem in large scale stream data: abnormality detection which is localised to a data segmentation process. Unlike traditional abnormality detection methods which typically build one unified model across data stream, we propose that building multiple detection models focused on different coherent sections of the video stream would result in better detection performance. One key challenge is to segment the data into coherent sections as the number of segments is not known in advance and can vary greatly across cameras; and a principled way approach is required. To this end, we first employ the recently proposed infinite HMM and collapsed Gibbs inference to automatically infer data segmentation followed by constructing abnormality detection models which are localised to each segmentation.We demonstrate the superior performance of the proposed framework in a realworld surveillance camera data over 14 days.
    @INPROCEEDINGS { nguyen_phung_rana_pham_venkatesh_icpr12,
        TITLE = { Multi-modal Abnormality Detection in Video with Unknown Data Segmentation },
        AUTHOR = { Nguyen, Tien Vu and Phung, Dinh and Rana, Santu and Pham, Duc Son and Venkatesh, Svetha },
        BOOKTITLE = { Pattern Recognition (ICPR), 2012 21st International Conference on },
        YEAR = { 2012 },
        ADDRESS = { Tsukuba, Japan },
        MONTH = { November },
        ORGANIZATION = { IEEE },
        PAGES = { 1322--1325 },
        ABSTRACT = { This paper examines a new problem in large scale stream data: abnormality detection which is localised to a data segmentation process. Unlike traditional abnormality detection methods which typically build one unified model across data stream, we propose that building multiple detection models focused on different coherent sections of the video stream would result in better detection performance. One key challenge is to segment the data into coherent sections as the number of segments is not known in advance and can vary greatly across cameras; and a principled way approach is required. To this end, we first employ the recently proposed infinite HMM and collapsed Gibbs inference to automatically infer data segmentation followed by constructing abnormality detection models which are localised to each segmentation.We demonstrate the superior performance of the proposed framework in a realworld surveillance camera data over 14 days. },
        OWNER = { dinh },
        TIMESTAMP = { 2012.06.26 },
    }
C
  • Embedded Restricted Boltzmann Machines for Fusion of Mixed Data
    Truyen, T., Phung, D. and Venkatesh, S.. In Proc. of IEEE Int. Conf. on Fusion (FUSION), Singapore, July 2012. [ | ]
    @INPROCEEDINGS { truyen_phung_venkatesh_fusion12,
        TITLE = { Embedded Restricted Boltzmann Machines for Fusion of Mixed Data },
        AUTHOR = { Truyen, T. and Phung, D. and Venkatesh, S. },
        BOOKTITLE = { Proc. of IEEE Int. Conf. on Fusion (FUSION) },
        YEAR = { 2012 },
        ADDRESS = { Singapore },
        MONTH = { July },
        OWNER = { dinh },
        TIMESTAMP = { 2012.05.24 },
    }
C
  • Learning Boltzmann distance metric for face recognition
    Truyen, T., Phung, D. and Venkatesh, S.. In Proc. of IEEE International Conference on Multimedia and Expo (ICME), Melbourne, Australia, July 2012. [ | ]
    @INPROCEEDINGS { truyen_phung_venkatesh_icme12,
        TITLE = { Learning Boltzmann distance metric for face recognition },
        AUTHOR = { Truyen, T. and Phung, D. and Venkatesh, S. },
        BOOKTITLE = { Proc. of IEEE International Conference on Multimedia and Expo (ICME) },
        YEAR = { 2012 },
        ADDRESS = { Melbourne, Australia },
        MONTH = { July },
        OWNER = { thinng },
        TIMESTAMP = { 2012.04.11 },
    }
C
  • Funniest Thing I've Seen Since: Shifting Perspectives from Multimedia Artefacts to Utterances
    Adams, B., Phung, D. and Venkatesh, S.. In Proceedings of ACM Workshop on Socially-Aware Multimedia, in conjunction with ACM Int. Conf on Multimedia (ACM-MM), Nara, Japan, October 2012. [ | ]
    Suicide is a major concern in society. Despite of great attention paid by the community with very substantive medico-legal implications, there has been no satisfying method that can reliably predict the future attempted or completed suicide. We present an integrated machine learning framework to tackle this challenge. Our proposed framework consists of a novel feature extraction scheme, an embedded feature selection process, a set of risk classifiers and finally, a risk calibration procedure. For temporal feature extraction, we cast the patient’s clinical history into a temporal image to which a bank of one-side filters are applied. The responses are then partly transformed into mid-level features and then selected in L1-norm framework under the extreme value theory. A set of probabilistic ordinal risk classifiers are then applied to compute the risk probabilities and further re-rank the features. Finally, the predicted risks are calibrated. Together with our Australian partner, we perform comprehensive study on data collected for the mental health cohort, and the experiments validate that our proposed framework outperforms risk assessment instruments by medical practitioners.
    @CONFERENCE { adams_phung_venkatesh_acmmm12,
        TITLE = { Funniest Thing I've Seen Since: Shifting Perspectives from Multimedia Artefacts to Utterances },
        AUTHOR = { Adams, B. and Phung, D. and Venkatesh, S. },
        BOOKTITLE = { Proceedings of ACM Workshop on Socially-Aware Multimedia, in conjunction with ACM Int. Conf on Multimedia (ACM-MM) },
        YEAR = { 2012 },
        ADDRESS = { Nara, Japan },
        MONTH = { October },
        ABSTRACT = { Suicide is a major concern in society. Despite of great attention paid by the community with very substantive medico-legal implications, there has been no satisfying method that can reliably predict the future attempted or completed suicide. We present an integrated machine learning framework to tackle this challenge. Our proposed framework consists of a novel feature extraction scheme, an embedded feature selection process, a set of risk classifiers and finally, a risk calibration procedure. For temporal feature extraction, we cast the patient’s clinical history into a temporal image to which a bank of one-side filters are applied. The responses are then partly transformed into mid-level features and then selected in L1-norm framework under the extreme value theory. A set of probabilistic ordinal risk classifiers are then applied to compute the risk probabilities and further re-rank the features. Finally, the predicted risks are calibrated. Together with our Australian partner, we perform comprehensive study on data collected for the mental health cohort, and the experiments validate that our proposed framework outperforms risk assessment instruments by medical practitioners. },
        FILE = { :papers\\phung\\adams_phung_venkatesh_acmmm12.pdf:PDF;:papers\\phung\\adams_phung_venkatesh_acmmm12.pptx:OpenDocument presentation },
        TIMESTAMP = { 2012.10.31 },
    }
C
  • Large-Scale Statistical Modeling of Motion Patterns: A Bayesian Nonparametric Approach
    Rana, S., Phung, D., Pham, S. and Venkatesh, S.. In Proceedings of Indian Conference on Vision, Graphics and Image Processing, India, December 2012. [ | ]
    We propose a novel framework for large-scale scene understanding in static camera surveillance. Our techniques combine fast rank-1 constrained robust PCA to compute the foreground, with non-parametric Bayesian models for inference. Clusters are extracted in foreground patterns using a joint multinomial+Gaussian Dirichlet process model (DPM). Since the multinomial distribution is normalized, the Gaussian mixture distinguishes between similar spatial patterns but different activity levels (eg. car vs bike). We propose a modification of the decayed MCMC technique for incremental inference, providing the ability to discover theoretically unlimited patterns in unbounded video streams. A promising by-product of our framework is online, abnormal activity detection. A benchmark video and two surveillance videos, with the longest being 140 hours long are used in our experiments. The patterns discovered are as informative as existing scene understanding algorithms. However, unlike existing work, we achieve near real-time execution and encouraging performance in abnormal activity detection.
    @INPROCEEDINGS { rana_phung_pham_venkatesh_civgip12,
        TITLE = { Large-Scale Statistical Modeling of Motion Patterns: A Bayesian Nonparametric Approach },
        AUTHOR = { Rana, S. and Phung, D. and Pham, S. and Venkatesh, S. },
        BOOKTITLE = { Proceedings of Indian Conference on Vision, Graphics and Image Processing },
        YEAR = { 2012 },
        ADDRESS = { India },
        MONTH = { December },
        ABSTRACT = { We propose a novel framework for large-scale scene understanding in static camera surveillance. Our techniques combine fast rank-1 constrained robust PCA to compute the foreground, with non-parametric Bayesian models for inference. Clusters are extracted in foreground patterns using a joint multinomial+Gaussian Dirichlet process model (DPM). Since the multinomial distribution is normalized, the Gaussian mixture distinguishes between similar spatial patterns but different activity levels (eg. car vs bike). We propose a modification of the decayed MCMC technique for incremental inference, providing the ability to discover theoretically unlimited patterns in unbounded video streams. A promising by-product of our framework is online, abnormal activity detection. A benchmark video and two surveillance videos, with the longest being 140 hours long are used in our experiments. The patterns discovered are as informative as existing scene understanding algorithms. However, unlike existing work, we achieve near real-time execution and encouraging performance in abnormal activity detection. },
        FILE = { :papers\\phung\\rana_phung_pham_venkatesh_civgip12.pdf:PDF },
        JOURNAL = { British Journal of Psychiatry },
        OWNER = { dinh },
        TIMESTAMP = { 2012.10.31 },
    }
C
  • TOBY playpad - an accelarated learning tool for children with autism
    Duong, T., Venkatesh, S., Phung, D., Greenhill, S. and Adams. Autism Spectrum Disorder Research Forum, Melbourne, Australia, November 2012. [ | ]
    The diagnosis of children with Autism Spectrum Disorder (ASD) is on the rise and it is well-known that early intervention is critical. However, there often exists a long gap of waiting and wasting time for and between a “formal” diagnosis and therapy. The aim of TOBY Playpad (www.tobyplaypad.com) is to close this gap by empowering parents to help their children early and naturally at home and in their daily activities. The current form of TOBY is an iPad application that provides an adaptive syllabus of more than 200 activities developed by autism and machine learning experts to target key development areas which are known to be deficit for ASD children such as imitation, joint attention and language. TOBY delivers lessons, materials, instructions and interactions for both on-iPad and Natural Environment Tasks (NET) off-iPad activities. Since each child is different, TOBY is highly adaptive and perso