记一次华为高斯数据库部署事件 关于pg_perf的事件 最后更新时间:2025年04月21日 ### 事件描述 测试用的高斯数据库突然不能用了。因为是docker部署的。上服务器之后就使用`docker ps -a|grep gauss`然后就发现服务已经Exit了很久了。 ### 检查日志 阅读日志后的核心问题: ```shell 2025-04-21 08:41:08.554 [unknown] [unknown] localhost 139652928928192 0[0:0#0] 0 [BACKEND] FATAL: ERROR: could not create instr log directory "pg_perf": Permission denied ``` 也就是说,docker在启动高斯后,对于我的宿主机权限不够,无法创建pg_perf这个目录。 检查日志后得到的结果如下: ```shell openGauss Database directory appears to contain a database; Skipping initialization 0 LOG: [Alarm Module]can not read GAUSS_WARNING_TYPE env. 0 LOG: [Alarm Module]Host Name: a2aaf89b6607 0 LOG: [Alarm Module]Host IP: a2aaf89b6607. Copy hostname directly in case of taking 10s to use 'gethostbyname' when /etc/hosts does not contain 0 LOG: [Alarm Module]Get ENV GS_CLUSTER_NAME failed! 0 LOG: [Alarm Module]Invalid data in AlarmItem file! Read alarm English name failed! line: 57 0 WARNING: failed to open feature control file, please check whether it exists: FileName=gaussdb.version, Errno=2, Errmessage=No such file or directory. 0 WARNING: failed to parse feature control file: gaussdb.version. 0 WARNING: Failed to load the product control file, so gaussdb cannot distinguish product version. 2025-04-21 08:41:08.297 [unknown] [unknown] localhost 139652928928192 0[0:0#0] 0 [BACKEND] LOG: when starting as multi_standby mode, we couldn't support data replicaton. 2025-04-21 08:41:08.297 [unknown] [unknown] localhost 139652928928192 0[0:0#0] 0 [BACKEND] LOG: base_page_saved_interval is 400, ori is 400. 2025-04-21 08:41:08.304 [unknown] [unknown] localhost 139652928928192 0[0:0#0] 0 [BACKEND] LOG: [Alarm Module]can not read GAUSS_WARNING_TYPE env. 2025-04-21 08:41:08.304 [unknown] [unknown] localhost 139652928928192 0[0:0#0] 0 [BACKEND] LOG: [Alarm Module]Host Name: a2aaf89b6607 2025-04-21 08:41:08.304 [unknown] [unknown] localhost 139652928928192 0[0:0#0] 0 [BACKEND] LOG: [Alarm Module]Host IP: a2aaf89b6607. Copy hostname directly in case of taking 10s to use 'gethostbyname' when /etc/hosts does not contain 2025-04-21 08:41:08.304 [unknown] [unknown] localhost 139652928928192 0[0:0#0] 0 [BACKEND] LOG: [Alarm Module]Get ENV GS_CLUSTER_NAME failed! 2025-04-21 08:41:08.304 [unknown] [unknown] localhost 139652928928192 0[0:0#0] 0 [BACKEND] LOG: [Alarm Module]Invalid data in AlarmItem file! Read alarm English name failed! line: 57 2025-04-21 08:41:08.306 [unknown] [unknown] localhost 139652928928192 0[0:0#0] 0 [BACKEND] LOG: loaded library "security_plugin" 2025-04-21 08:41:08.306 [unknown] [unknown] localhost 139652928928192 0[0:0#0] 0 [BACKEND] WARNING: could not create any HA TCP/IP sockets 2025-04-21 08:41:08.306 [unknown] [unknown] localhost 139652928928192 0[0:0#0] 0 [BACKEND] WARNING: could not create any HA TCP/IP sockets 2025-04-21 08:41:08.308 [unknown] [unknown] localhost 139652928928192 0[0:0#0] 0 [BACKEND] LOG: InitNuma numaNodeNum: 1 numa_distribute_mode: none inheritThreadPool: 0. 2025-04-21 08:41:08.308 [unknown] [unknown] localhost 139652928928192 0[0:0#0] 0 [BACKEND] LOG: reserved memory for backend threads is: 220 MB 2025-04-21 08:41:08.308 [unknown] [unknown] localhost 139652928928192 0[0:0#0] 0 [BACKEND] LOG: reserved memory for WAL buffers is: 128 MB 2025-04-21 08:41:08.308 [unknown] [unknown] localhost 139652928928192 0[0:0#0] 0 [BACKEND] LOG: Set max backend reserve memory is: 348 MB, max dynamic memory is: 8139 MB 2025-04-21 08:41:08.308 [unknown] [unknown] localhost 139652928928192 0[0:0#0] 0 [BACKEND] LOG: shared memory 3288 Mbytes, memory context 8487 Mbytes, max process memory 12288 Mbytes 2025-04-21 08:41:08.392 [unknown] [unknown] localhost 139652928928192 0[0:0#0] 0 [CACHE] LOG: set data cache size(402653184) 2025-04-21 08:41:08.453 [unknown] [unknown] localhost 139652928928192 0[0:0#0] 0 [SEGMENT_PAGE] LOG: Segment-page constants: DF_MAP_SIZE: 8156, DF_MAP_BIT_CNT: 65248, DF_MAP_GROUP_EXTENTS: 4175872, IPBLOCK_SIZE: 8168, EXTENTS_PER_IPBLOCK: 1021, IPBLOCK_GROUP_SIZE: 4090, BMT_HEADER_LEVEL0_TOTAL_PAGES: 8323072, BktMapEntryNumberPerBlock: 2038, BktMapBlockNumber: 25, BktBitMaxMapCnt: 512 2025-04-21 08:41:08.502 [unknown] [unknown] localhost 139652928928192 0[0:0#0] 0 [BACKEND] LOG: gaussdb: fsync file "/var/lib/opengauss/data/gaussdb.state.temp" success 2025-04-21 08:41:08.502 [unknown] [unknown] localhost 139652928928192 0[0:0#0] 0 [BACKEND] LOG: create gaussdb state file success: db state(STARTING_STATE), server mode(Normal), connection index(1) 2025-04-21 08:41:08.514 [unknown] [unknown] localhost 139652928928192 0[0:0#0] 0 [BACKEND] LOG: max_safe_fds = 982, usable_fds = 1000, already_open = 8 2025-04-21 08:41:08.520 [unknown] [unknown] localhost 139652928928192 0[0:0#0] 0 [BACKEND] LOG: the configure file /usr/local/opengauss/etc/gscgroup_omm.cfg doesn't exist or the size of configure file has changed. Please create it by root user! 2025-04-21 08:41:08.520 [unknown] [unknown] localhost 139652928928192 0[0:0#0] 0 [BACKEND] LOG: Failed to parse cgroup config file. 2025-04-21 08:41:08.554 [unknown] [unknown] localhost 139652928928192 0[0:0#0] 0 [EXECUTOR] WARNING: Failed to obtain environment value $GAUSSLOG! 2025-04-21 08:41:08.554 [unknown] [unknown] localhost 139652928928192 0[0:0#0] 0 [EXECUTOR] DETAIL: N/A 2025-04-21 08:41:08.554 [unknown] [unknown] localhost 139652928928192 0[0:0#0] 0 [EXECUTOR] CAUSE: Incorrect environment value. 2025-04-21 08:41:08.554 [unknown] [unknown] localhost 139652928928192 0[0:0#0] 0 [EXECUTOR] ACTION: Please refer to backend log for more details. 2025-04-21 08:41:08.554 [unknown] [unknown] localhost 139652928928192 0[0:0#0] 0 [BACKEND] FATAL: ERROR: could not create instr log directory "pg_perf": Permission denied 2025-04-21 08:41:08.594 [unknown] [unknown] localhost 139652928928192 0[0:0#0] 0 [BACKEND] LOG: FiniNuma allocIndex: 0. ``` ### 解决情况 [失败]思路一:调整部署目录以及挂载目录的权限。根据这个结果,去调整了数据库挂载的data目录权限,同时调整了整个服务部署位置的权限。但结果依然失败。 [失败]思路二:删除原数据库,重新部署。结果不管重新部署run多少次,都是失败的。 [成功]思路三:删除原服务,并且删除了对照的image,重新拉取image,重新部署。系统成功运行了起来。
Comments | NOTHING