Approximate Aggregate Query Method Based on Two-Stage Stratified Sampling
CSTR:
Author:
Affiliation:

1.College of Information, North China University of Technology, Beijing 100144,China;2.Beijing Key Laboratory on Integration and Analysis of Large-Scale Stream Data, Beijing 100144,China

Clc Number:

TP391

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    The interactive query analysis technology represented by data warehouse application provides support for intelligent decision-making. With the continuous increase of data scale, accurate calculation of query results often requires global data scanning, which makes the group-by query face the problem of insufficient real-time response ability. Based on the pre-extracted sample data, it can provide fast approximate answers for aggregate queries, which is a feasible solution to this problem in many scenarios. This paper analyzes the specific conditions that stratified sampling is better than random sampling, and proposes a two-stage stratified sampling method. In the first stage, the sampling is grouped according to the business characteristics. In each grouping, the random sampling method is first used for random sampling, and the sampling effect is evaluated. To improve the effect of approximate query, the second stage sampling is carried out, and the self-organizing feature mapping (SOM) clustering method is used to group the values. Experimental results on the public data set and the actual power grid data show that, compared with random sampling, stratified random sampling and congressional sampling algorithm, performance of the proposed method can be improved by 15% at most under the same sampling rate. And SOM has better approximate query results than K-means and density-based spatial clustering of applications with noise (DBSCAN) clustering methods.

    Reference
    Related
    Cited by
Get Citation

Fang Jun, Zhao Bo, Zuo Changqi. Approximate Aggregate Query Method Based on Two-Stage Stratified Sampling[J].,2022,37(5):1049-1058.

Copy
Related Videos

Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:September 10,2021
  • Revised:January 23,2022
  • Adopted:
  • Online: September 25,2022
  • Published:
Article QR Code