fix coredump when I create table in coordinator restore mode #1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi koichi;
In coordinator restore mode. I got coredump when I create table.
Recreate steps:
1.pg_ctl start -Z restoremode -D /rdbdata/bcrdb_data/coord
2.psql -hzhcx5i -p20015 cxdb
3.create table:
CREATE TABLE cm_busi_handle_201301 (
so_nbr bigint,
region_code integer,
process_id integer,
process_result integer,
handle_seq integer,
op_id integer,
oper_date timestamp without time zone,
oper_end_date timestamp without time zone,
invoice_no character varying(20),
property character varying(20),
oper_desc text
)
DISTRIBUTE BY MODULO (region_code)
TO NODE (datanode1,datanode2,datanode3,datanode4,datanode5,datanode6,datanode7,datanode8)
4.get coredump
cxdb=# CREATE TABLE cm_busi_handle_201301 (
cxdb(# so_nbr bigint,
cxdb(# region_code integer,
cxdb(# process_id integer,
cxdb(# process_result integer,
cxdb(# handle_seq integer,
cxdb(# op_id integer,
cxdb(# oper_date timestamp without time zone,
cxdb(# oper_end_date timestamp without time zone,
cxdb(# invoice_no character varying(20),
cxdb(# property character varying(20),
cxdb(# oper_desc text
cxdb(# )
cxdb-# DISTRIBUTE BY MODULO (region_code)
cxdb-# TO NODE (datanode1,datanode2,datanode3,datanode4,datanode5,datanode6,datanode7,datanode8);
The connection to the server was lost. Attempting reset: Failed.
!>
5.stack when get coredump
gdb /rdbdata/bcrdb_install/bin/postgres /tmp/corefile/core.postgres.48524
(gd bt
#0 0x00000036a48328a5 in raise () from /lib64/libc.so.6
#1 0x00000036a4834085 in abort () from /lib64/libc.so.6
#2 0x00000036a486fa37 in __libc_message () from /lib64/libc.so.6
#3 0x00000036a4875366 in malloc_printerr () from /lib64/libc.so.6
#4 0x00000036a4877e93 in _int_free () from /lib64/libc.so.6
#5 0x0000000000769879 in AllocSetDelete (context=) at aset.c:551
#6 0x0000000000769dad in MemoryContextDelete (context=0x12231e8) at mcxt.c:193
#7 0x000000000076aa70 in PortalDrop (portal=0x122d0c0, isTopCommit=) at portalmem.c:588
#8 0x000000000067ddaa in exec_simple_query (
#9 0x000000000067f82f in PostgresMain (argc=, argv=, dbname=0x1166708 "cxdb",
#10 0x000000000063b84a in BackendRun (argc=, argv=) at postmaster.c:4202
#11 BackendStartup (argc=, argv=) at postmaster.c:3891
#12 ServerLoop (argc=, argv=) at postmaster.c:1702
#13 PostmasterMain (argc=, argv=) at postmaster.c:1369
#14 0x00000000005d1420 in main (argc=4, argv=0x1131c70) at main.c:206
My analysis is as below:
when I start coordinator in restoremode. Pooler process is not running. So NumDataNodes is zero.
so there is a problem in function BuildRelationDistributionNodes
{
//nodeoids. If we need more memory, there is a memory overflow. so when postgres free memory, get coredump
}
The right code as follows:
BuildRelationDistributionNodes(List *nodes, int *numnodes)
{
Oid *nodeoids;
ListCell *item;
int numdatanotes;
*numnodes = 0;
numdatanotes=list_length(nodes);
nodeoids = (Oid _) palloc0(numdatanotes_sizeof(Oid));
}
I have put changed code in git repo, and pull request to you. Please help to review it. If and problem. Please let me know.
Thanks
[email protected]