欢迎访问昆山宝鼎软件有限公司网站! 设为首页 | 网站地图 | XML | RSS订阅 | 宝鼎邮箱 | 后台管理


新闻资讯

MENU

软件开发知识

HISTOGRAM-$.number-of-buckets-specified AS number_of_bucket

点击: 次  来源:劳务派遣管理系统 时间:2018-09-16

原文出处: MSSQL123

直方图是表上某个字段在凭据必然百分比和纪律采样后的数据漫衍的一种描写,最重要的浸染之一就是按照查询条件,预估切合条件的数据量,为sql执行打算的生成提供重要的依据。

在MySQL 8.0之前的版本中,MySQL仅有一个简朴的统计信息却没有直方图,没有直方图的统计信息可以说是没有任何意义的。

MySQL 8.0新特性之一就是开始支持统计信息的直方图,这个观念很早就提出来了,昆山软件开发,抽闲详细实验了一下利用要领。

之前写过MSSQL相关统计信息的一点对象,在道理上都是一致的,https://www.cnblogs.com/wy123/p/5875237.html

还是,直接上例子,造数据,建设一个测试情况

create table test
(
    id int auto_increment primary key,
    name varchar(100),
    create_date datetime ,
    index (create_date desc)
);


USE `db01`$$

DROP PROCEDURE IF EXISTS `insert_test_data`$$

CREATE DEFINER=`root`@`%` PROCEDURE `insert_test_data`()
BEGIN
    DECLARE v_loop INT;
    SET v_loop = 100000;
    WHILE v_loop>0 DO
        INSERT INTO test(NAME,create_date)VALUES (UUID(),DATE_ADD(NOW(),INTERVAL -RAND()*100000 MINUTE) );
        SET v_loop = v_loop - 1;
    END WHILE;
END$$

DELIMITER ;

MySQL中统计信息的建设,差异于MSSQL,MySQL统计信息不依赖于索引,需要单独建设,语法如下

  • 建设字段上的统计直方图信息
  • ANALYZE TABLE test UPDATE HISTOGRAM ON create_date,name WITH 16 BUCKETS;

  • 删除字段上的统计直方图信息
  • ANALYZE TABLE test DROP HISTOGRAM ON create_date

    1,可以一次性建设多个字段的统计信息,系统会逐个建设列出的字段上的统计信息,统计信息不依赖于索引,这一点与MSSQL差异(虽然MSSQL也可以抛开索引独立建设统计信息)
    2,BUCKETS值是一个必需提供的参数,默认值为1000,范畴是1-1024,这一点也差异与MSSQL也纷歧样,MSSQL是有一个雷同的最大值为200的步长(step)字段
    3,一般来说,数据量较大的环境下,对付不反复可能反复性不高的数据,BUCKETS值越大,描写出来的统计信息越具体
    4,统计信息的详细内容在 information_schema.column_statistics中,可是可读性并欠好,可以按照需求自行理会(出来一种本身喜欢的名目)

    与sqlserver中的统计信息一样,理论上,在精确性与取样百分比(BUCKETS)是成正比的,虽然生成统计信息的价钱也就越大,至于BUCKETS与统计信息的取样百分比,以及综合价钱,昆山软件开发,笔者临时没有找到相关的资料。

    如下是通过ANALYZE TABLE test UPDATE HISTOGRAM ON create_date WITH 4 BUCKETS;建设的统计信息直方图
    可以发明直方图的HISTOGRAM字段是一个JSON名目标字符串,可读性并欠好。

     HISTOGRAM-$.number-of-buckets-specified AS number_of_buckets_specified FROM INFORMATION_SCHEMA.COLUMN_STATISTIC <a href=劳务调派系统S WHERE schema_name = p_schema_name AND table_name = p_table_name AND column_name = p_column_name ) SELECT CASE WHEN id = 1 THEN p_schema_name ELSE END AS schema_name" src="http://www.importnew.com/https:/images2018.cnblogs.com/blog/380271/201808/380271-20180823180150916-868204153.png" />

    想到了sqlserver中DBCC SHOW_STATISTICS的直方图信息,如下的名目,直方图中的数据漫衍环境看起来很是清晰直观

     HISTOGRAM-$.number-of-buckets-specified AS number_of_buckets_specified FROM INFORMATION_SCHEMA.COLUMN_STATISTIC <a href=劳务调派系统S WHERE schema_name = p_schema_name AND table_name = p_table_name AND column_name = p_column_name ) SELECT CASE WHEN id = 1 THEN p_schema_name ELSE END AS schema_name" src="http://www.importnew.com/https:/images2018.cnblogs.com/blog/380271/201808/380271-20180823180457193-6409539.png" />

    于是就做了一个MySQL直方图的名目转换,说白了就是理会information_schema.column_statistics表中的HISTOGRAM 字段中的JSON内容

    如下,一个简朴的理会直方图统计信息json数据的存储进程,参数别离是库名,表名,字段名

    DELIMITER $$
    
    USE `db01`$$
    
    DROP PROCEDURE IF EXISTS `parse_column_statistics`$$
    
    CREATE DEFINER=`root`@`%` PROCEDURE `parse_column_statistics`(
        IN `p_schema_name` VARCHAR(200),
        IN `p_table_name` VARCHAR(200),
        IN `p_column_name` VARCHAR(200)
    )
    BEGIN
        
        DECLARE v_histogram TEXT;
        -- get the special HISTOGRAM
        SELECT HISTOGRAM->>'$."buckets"' INTO v_HISTOGRAM 
        FROM   information_schema.column_statistics
        WHERE schema_name =  p_schema_name 
        AND table_name = p_table_name 
        AND column_name = p_column_name; 
        
        -- remove the first and last [ and ] char
        SET v_histogram = SUBSTRING(v_HISTOGRAM,2,LENGTH(v_HISTOGRAM)-2);
    
        DROP TABLE IF EXISTS t_buckets ;
        CREATE TEMPORARY TABLE t_buckets
        (
            id INT AUTO_INCREMENT PRIMARY KEY,
            buckets_content VARCHAR(500)
        );
        
        -- split by "]," and get single bucket content    
        WHILE (INSTR(v_histogram,'],')>0) DO
            INSERT INTO t_buckets(buckets_content)
            SELECT SUBSTRING(v_histogram,1,INSTR(v_histogram,'],'));
            SET v_HISTOGRAM = SUBSTRING(v_histogram,INSTR(v_histogram,'],')+2,LENGTH(v_histogram));    
        END WHILE;
       
        INSERT INTO t_buckets(buckets_content) 
        SELECT v_histogram;
        
        -- get the basic statistics data
        WITH cte AS
        (
            SELECT 
            HISTOGRAM->>'$."last-updated"' AS last_updated,
            HISTOGRAM->>'$."number-of-buckets-specified"' AS number_of_buckets_specified
            FROM INFORMATION_SCHEMA.COLUMN_STATISTICS
            WHERE schema_name =  p_schema_name 
            AND table_name = p_table_name 
            AND column_name = p_column_name
        )
        SELECT 
            CASE WHEN id = 1 THEN p_schema_name ELSE '' END AS schema_name,
            CASE WHEN id = 1 THEN p_table_name ELSE '' END AS table_name,
            CASE WHEN id = 1 THEN p_column_name ELSE '' END AS column_name, 
            CASE WHEN id = 1 THEN last_updated ELSE '' END AS last_updated,
            CASE WHEN id = 1 THEN number_of_buckets_specified ELSE '' END AS 'number_of_buckets_specified' ,
            id AS buckets_specified_index,
            buckets_content
        FROM
        (
            SELECT * FROM cte,t_buckets
        )t;
    
    END$$
    
    DELIMITER ;