基于压缩的本地差分隐私的序列数据收集方法
作者:
作者单位:

1.南京航空航天大学计算机科学与技术学院,南京 211106;2.南京航空航天大学电子信息工程学院, 南京 211106

作者简介:

通讯作者:

基金项目:

江苏省重点研发计划(产业前瞻与关键核心技术)(BE2022068,BE2022068-1);国家自然科学基金(62172216);中央高校基本科研业务费项目(NP2024117);稳定支持国防特色学科基础研究项目(ILF240061A24)。


Sequential Data Collection Method with Condensed Local Differential Privacy
Author:
Affiliation:

1.College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China;2.College of Electronic and Information Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    压缩的本地差分隐私是本地差分隐私的一种基于度量的放松形式,它具有比本地差分隐私更好的效用性和灵活性。但是,现有方案在序列模式捕捉和效用性方面存在不足。为了克服这些局限性,提出了一种新颖的基于压缩的本地差分隐私的序列数据收集方法SCM-CLDP。该方法在收集过程中充分考虑了序列数据的长度、转移等重要信息,通过这些信息,数据收集者能够合成接近原始数据集的隐私保护的数据集。根据扰动对象的不同,本文提出了两种收集方法,分别是基于值扰动的SCM-VP方法和基于转移扰动的SCM-TP方法。理论证明了SCM-VP和SCM-TP满足序列级别的压缩的本地差分隐私,并基于两个真实数据集,在Markov链模型准确性、合成数据集效用性及频繁序列模式挖掘准确性上,与现有方案进行了对比实验。结果表明,SCM-CLDP表现出显著的优势,其中SCM-VP的性能在大多数情况下都要优于SCM-TP。并且在最优的情况下,相较于现有方法,SCM-CLDP在Markov链模型及合成数据集分布误差方面至少降低了一个数量级。同时,SCM-CLDP在合成数据集中各项频率排序的准确性以及频繁序列模式挖掘的准确性方面,相较于现有方法提升了近30%。

    Abstract:

    Condensed local differential privacy is a metric-based relaxation of local differential privacy with better utility and flexibility than local differential privacy. However, existing solutions are deficient in terms of sequence pattern capture and utility. To address these limitations, this paper proposes SCM-CLDP, a novel sequential data collection method based on condensed local differential privacy. SCM-CLDP fully takes into account important information such as the length and transitions of sequential data during the collection process, through which the data collector is able to synthesize privacy-preserving dataset close to the original dataset. Specifically, according to different perturbation objects, we propose two collection methods, SCM-VP based on value perturbation and SCM-TP based on transition perturbation, respectively. We theoretically prove that SCM-VP and SCM-TP satisfy sequence-level condensed local differential privacy, and comparative experiments are conducted with existing solutions based on two real datasets in terms of Markov chain model accuracy, synthetic dataset utility, and frequent sequence pattern mining accuracy. The results show that SCM-CLDP performs significantly better than the existing solutions, with SCM-VP outperforming SCM-TP in most cases. In the optimal situation, SCM-CLDP reduces the error of the Markov chain model and the distribution of the synthetic dataset by at least one order of magnitude compared to the existing method. Meanwhile, SCM-CLDP improves the accuracy of item frequency ranking of the synthetic dataset and the accuracy of frequent sequence pattern mining by nearly 30% compared to existing solutions.

    参考文献
    相似文献
    引证文献
引用本文

金严,朱友文,吴启晖.基于压缩的本地差分隐私的序列数据收集方法[J].数据采集与处理,2025,40(3):659-674

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2024-06-19
  • 最后修改日期:2024-10-22
  • 录用日期:
  • 在线发布日期: 2025-06-13