Wednesday, April 11, 2018

Ceph pool rbd has many more objects per pg than average (too few pgs?)

Ceph pool rbd has many more objects per pg than average (too few pgs?)



Targeting issues

[root@lab8106 ~]# ceph -s
    cluster fa7ec1a1-662a-4ba3-b478-7cb570482b62
     health HEALTH_WARN
            pool rbd has many more objects per pg than average (too few pgs?)
     monmap e1: 1 mons at {lab8106=192.168.8.106:6789/0}
            election epoch 30, quorum 0 lab8106
     osdmap e157: 2 osds: 2 up, 2 in
            flags sortbitwise
      pgmap v1023: 417 pgs, 13 pools, 18519 MB data, 15920 objects
            18668 MB used, 538 GB / 556 GB avail
                 417 active+clean

The cluster has this warning. pool rbd has many more objects per pg than average (too few pgs?)The alert in the hammer version ispool rbd has too few pgs

This place to view the cluster details:

[root@lab8106 ~]# ceph health detail
HEALTH_WARN pool rbd has many more objects per pg than average (too few pgs?); mon.lab8106 low disk space
pool rbd objects per pg (1912) is more than 50.3158 times cluster average (38)
Look at the object state of the cluster's pool

[root@lab8106 ~]# ceph df
GLOBAL:
    SIZE     AVAIL     RAW USED     %RAW USED
    556G      538G       18668M          3.28
POOLS:
    NAME       ID     USED       %USED     MAX AVAIL     OBJECTS
    rbd        6      16071M      2.82          536G       15296
    pool1      7        204M      0.04          536G          52
    pool2      8        184M      0.03          536G          47
    pool3      9        188M      0.03          536G          48
    pool4      10       192M      0.03          536G          49
    pool5      11       204M      0.04          536G          52
    pool6      12       148M      0.03          536G          38
    pool7      13       184M      0.03          536G          47
    pool8      14       200M      0.04          536G          51
    pool9      15       200M      0.04          536G          51
    pool10     16       248M      0.04          536G          63
    pool11     17       232M      0.04          536G          59
    pool12     18       264M      0.05          536G          67
Check the number of pg storage pool

[root@lab8106 ~]# ceph osd dump|grep pool
pool 6 'rbd' replicated size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 132 flags hashpspool stripe_width 0
pool 7 'pool1' replicated size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 1 pgp_num 1 last_change 134 flags hashpspool stripe_width 0
pool 8 'pool2' replicated size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 1 pgp_num 1 last_change 136 flags hashpspool stripe_width 0
pool 9 'pool3' replicated size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 1 pgp_num 1 last_change 138 flags hashpspool stripe_width 0
pool 10 'pool4' replicated size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 1 pgp_num 1 last_change 140 flags hashpspool stripe_width 0
pool 11 'pool5' replicated size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 1 pgp_num 1 last_change 142 flags hashpspool stripe_width 0
pool 12 'pool6' replicated size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 1 pgp_num 1 last_change 144 flags hashpspool stripe_width 0
pool 13 'pool7' replicated size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 1 pgp_num 1 last_change 146 flags hashpspool stripe_width 0
pool 14 'pool8' replicated size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 1 pgp_num 1 last_change 148 flags hashpspool stripe_width 0
pool 15 'pool9' replicated size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 1 pgp_num 1 last_change 150 flags hashpspool stripe_width 0
pool 16 'pool10' replicated size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 100 pgp_num 100 last_change 152 flags hashpspool stripe_width 0
pool 17 'pool11' replicated size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 100 pgp_num 100 last_change 154 flags hashpspool stripe_width 0
pool 18 'pool12' replicated size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 200 pgp_num 200 last_change 156 flags hashpspool stripe_width 0
We see how this is obtained

pool rbd objects per pg (1912) is more than 50.3158 times cluster average (38)

rbd objects_per_pg = 15296 / 8 = 1912
objects_per_pg = 15920 /417 ≈ 38
50.3158 = rbd objects_per_pg / objects_per_pg = 1912 / 38

That is, there are too few objects in other pools, and there are fewer pg and more objects, and this will be prompted. Let's look at the judgments in the code below.

https://github.com/ceph/ceph/blob/master/src/mon/PGMonitor.cc

int average_objects_per_pg = pg_map.pg_sum.stats.sum.num_objects / pg_map.pg_stat.size();
     if (average_objects_per_pg > 0 &&
         pg_map.pg_sum.stats.sum.num_objects >= g_conf->mon_pg_warn_min_objects &&
         p->second.stats.sum.num_objects >= g_conf->mon_pg_warn_min_pool_objects) {
int objects_per_pg = p->second.stats.sum.num_objects / pi->get_pg_num();
float ratio = (float)objects_per_pg / (float)average_objects_per_pg;
if (g_conf->mon_pg_warn_max_object_skew > 0 &&
    ratio > g_conf->mon_pg_warn_max_object_skew) {
  ostringstream ss;
  ss << "pool " << name << " has many more objects per pg than average (too few pgs?)";
  summary.push_back(make_pair(HEALTH_WARN, ss.str()));
  if (detail) {
    ostringstream ss;
    ss << "pool " << name << " objects per pg ("
       << objects_per_pg << ") is more than " << ratio << " times cluster average ("
       << average_objects_per_pg << ")";
    detail->push_back(make_pair(HEALTH_WARN, ss.str()));
  }
The following major restrictions

Mon_pg_warn_min_objects = 10000 //The total number of objects exceeds 10,000
mon_pg_warn_min_pool_objects = 1000 //The storage pool object exceeds 1000
mon_pg_warn_max_object_skew = 10 //This is a multiple of the mean of the above storage pool and the average of all pg

Solve the problem
There are three ways to resolve this warning:

Deleting useless storage pools 
If there are unused storage pools in the cluster and the relative number of pg is still high, then you can delete some of these storage pools, thereby lowering mon_pg_warn_max_object_skewthis value and warning will disappear.

It is possible to increase the number of pg of pools that are prompted. The number of pg in this storage pool is not enough from the beginning. Increasing the number of pg and pgp also reduces the mon_pg_warn_max_object_skewvalue.

Increased mon_pg_warn_max_object_skewparameter values
If there are already enough pg in the cluster, adding pg will be unstable. If you want to remove this warning, you can increase this parameter value, the default is 10
to sum up
This warning is a comparison of the deviation between the number of objects in the storage pool and the average number of objects in the pg of the entire cluster. If the deviation is too large, a warning is issued.

Check the steps:

ceph health detail
ceph df
ceph osd dump | grep pool
mon_pg_warn_max_object_skew = 10.0

((objects/pg_num) in the affected pool)/(objects/pg_num in the entire system) >= 10.0 警告就会出现

No comments: