[root@node1 omap]# ceph -s
cluster 911c57dc-a930-4da8-ab0e-69f6b6586e3d
health HEALTH_WARN
328 pgs degraded
328 pgs stuck degraded
328 pgs stuck unclean
328 pgs stuck undersized
328 pgs undersized
recovery 319/957 objects degraded (33.333%)
too many PGs per OSD (328 > max 300)
monmap e1: 3 mons at {node1=192.168.1.141:6789/0,node2=192.168.1.142:6789/0,node3=192.168.1.143:6789/0}
election epoch 108, quorum 0,1,2 node1,node2,node3
fsmap e705662: 1/1/1 up {0=node1=up:active}, 2 up:standby
osdmap e705733: 3 osds: 2 up, 2 in; 328 remapped pgs
flags sortbitwise,require_jewel_osds
pgmap v1296660: 328 pgs, 12 pools, 457 MB data, 319 objects
1071 MB used, 29626 MB / 30697 MB avail
319/957 objects degraded (33.333%)
328 active+undersized+degraded
查看是node1上的osd0 没有启动,于是启动osd0
2018-05-10 12:08:08.756988 7fa896131800 0 filestore(/var/lib/ceph/osd/ceph-0) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
2018-05-10 12:08:08.760327 7fa896131800 1 journal _open /var/lib/ceph/osd/ceph-0/journal fd 18: 5368709120 bytes, block size 4096 bytes, directio = 1, aio = 1
2018-05-10 12:08:08.762157 7fa896131800 1 journal _open /var/lib/ceph/osd/ceph-0/journal fd 18: 5368709120 bytes, block size 4096 bytes, directio = 1, aio = 1
2018-05-10 12:08:08.763165 7fa896131800 1 filestore(/var/lib/ceph/osd/ceph-0) upgrade
2018-05-10 12:08:08.763591 7fa896131800 0 <cls> cls/cephfs/cls_cephfs.cc:202: loading cephfs_size_scan
2018-05-10 12:08:08.763774 7fa896131800 0 <cls> cls/hello/cls_hello.cc:305: loading cls_hello
2018-05-10 12:08:08.770974 7fa896131800 -1 osd/OSD.h: In function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7fa896131800 time 2018-05-10 12:08:08.769611
osd/OSD.h: 894: FAILED assert(ret)
ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x560b85e5c9e5]
2: (OSDService::get_map(unsigned int)+0x3d) [0x560b8582f14d]
3: (OSD::init()+0x1fe2) [0x560b857e2b42]
4: (main()+0x2c01) [0x560b85746461]
5: (__libc_start_main()+0xf5) [0x7fa892e2fc05]
6: (()+0x35d917) [0x560b85790917]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
只能看到get_map(epoch_t) FAILED assert(ret)。不知道什么原因导致的。于是开启调试日志启动该osd查看更多详细信息。
[root@node1 ceph-0]# ceph-osd -f --cluster ceph -i 0 --setuser ceph --setgroup ceph --debug-osd=10 --debug-filestore=10 --log-to-stderr=1
查看输出中显示如下:
-2> 2018-05-10 12:57:33.725952 7f66199a7800 10 filestore(/var/lib/ceph/osd/ceph-0) error opening file /var/lib/ceph/osd/ceph-0/current/meta/DIR_4/DIR_A/DIR_0/osdmap.705730__0_C8EB90A4__none with flags=2: (13) Permission denied
-1> 2018-05-10 12:57:33.725960 7f66199a7800 10 filestore(/var/lib/ceph/osd/ceph-0) FileStore::read(meta/#-1:2509d713:::osdmap.705730:0#) open error: (13) Permission denied
0> 2018-05-10 12:57:33.727508 7f66199a7800 -1 osd/OSD.h: In function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f66199a7800 time 2018-05-10 12:57:33.725969
osd/OSD.h: 894: FAILED assert(ret)
有对象文件权限拒绝。
进入目录查看
[root@node1 DIR_0]# ll
total 24
-rw-r--r-- 1 ceph ceph 5171 Apr 25 09:32 osdmap.705305__0_C8F530A4__none
-rw-r--r-- 1 ceph ceph 5171 Apr 25 09:39 osdmap.705518__0_C8F460A4__none
-rw-r--r-- 1 root root 5659 May 10 11:48 osdmap.705730__0_C8EB90A4__none
均是root用户。于是修改权限
[root@node1 DIR_0]# chown ceph:ceph osdmap.705730__0_C8EB90A4__none
启动该osd
[root@node1 DIR_0]# systemctl start ceph-osd@0
查看集群状态
[root@node1 DIR_0]# ceph -s
cluster 911c57dc-a930-4da8-ab0e-69f6b6586e3d
health HEALTH_WARN
too many PGs per OSD (328 > max 300)
monmap e1: 3 mons at {node1=192.168.1.141:6789/0,node2=192.168.1.142:6789/0,node3=192.168.1.143:6789/0}
election epoch 108, quorum 0,1,2 node1,node2,node3
fsmap e705662: 1/1/1 up {0=node1=up:active}, 2 up:standby
osdmap e705736: 3 osds: 3 up, 3 in
flags sortbitwise,require_jewel_osds
pgmap v1296809: 328 pgs, 12 pools, 457 MB data, 319 objects
1593 MB used, 44453 MB / 46046 MB avail
328 active+clean
至此,集群恢复OK。