基于统计感知策略的高斯混合模型求解方法
作者:
作者单位:

1.深圳大学计算机与软件学院,深圳 518060;2.人工智能与数字经济广东省实验室(深圳),深圳 518107

作者简介:

通讯作者:

基金项目:

国家自然科学基金面上项目(61972261);广东省自然科学基金面上项目 (2023A1515011667);深圳市基础研究重点项目(JCYJ20220818100205012);深圳市基础研究面上项目(JCYJ20210324093609026)。


Solution Method of Gaussian Mixture Model with Statistical-Aware Strategy
Author:
Affiliation:

1.College of Computer Science & Software Engineering, Shenzhen University, Shenzhen 518060, China;2.Guangdong Laboratory of Artificial Intelligence and Digital Economy (Shenzhen), Shenzhen 518107, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    高斯混合模型(Gaussian mixture model,GMM)是一种经典的概率模型,常被用于无监督学习领域来确定无类别标记样本点的类别分布。作为求解GMM参数的重要技术,期望最大化(Expectation maximization,EM)算法通过计算GMM对应似然函数的最优解确定基模型自身参数以及基模型的混合系数。利用EM算法求解GMM存在如下两个缺陷:EM算法易于陷入局部最优解以及EM算法确定GMM基模型相关参数的不稳定,尤其是针对多维随机变量。本文提出了一种基于统计感知(Statistical-aware,SA)策略的GMM求解方法——SA-GMM方法。该方法从估计给定数据集的未知概率密度函数入手,建立了核密度估计(Kernel density estimation,KDE)与GMM之间的关联。为避免KDE对“过平滑”窗口的选取,设计了同时最小化KDE与GMM之间的经验风险和KDE窗口结构风险的目标函数,进而确定了GMM的最优参数。在11个标准概率分布上的实验证明了SA-GMM方法的可行性、合理性和有效性,同时结果也表明SA-GMM能够获得显著优于基于EM算法的GMM及其变体的概率密度函数估计表现。

    Abstract:

    Gaussian mixture model (GMM) is a classic probability model, which is usually used in the field of unsupervised learning to determine the class distribution of unlabeled samples. As an important method for solving GMM parameters, the expectation-maximization (EM) algorithm determines the parameters and component coefficients by calculating the optimal solution of the GMM likelihood function. The use of EM algorithm to solve GMM has the following two defects: EM algorithm is prone to getting stuck in a local optimal solution, and the relevant parameters of the GMM basic model determined by the EM algorithm are unstable, especially for high-dimensional data. For this reason, this paper proposes a GMM solution method based on statistical-aware (SA) strategy, i.e. SA-GMM method. Starting from the estimation of the unknown probability density function of a given data set, the method establishes the correlation between kernel density estimation (KDE) technology and GMM. To avoid the selection of KDE’s over-smoothing bandwidth, the goal is to simultaneously minimize the empirical risk between KDE and GMM and the structural risk of KDE’s bandwidth. The experiments on 11 standard probability distributions confirm the feasibility, rationality, and effectiveness of SA-GMM. And it is also shown that the proposed SA-GMM method can obtain the better performance on probability density function estimation than EM-based GMM and its variant.

    参考文献
    相似文献
    引证文献
引用本文

陈佳琪,何玉林,黄哲学,FOURNIER-VIGER Philippe.基于统计感知策略的高斯混合模型求解方法[J].数据采集与处理,2023,38(3):525-538

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2022-06-30
  • 最后修改日期:2022-12-11
  • 录用日期:
  • 在线发布日期: 2023-05-25