Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix coredump when I create table in coordinator restore mode #1

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

xuegang
Copy link

@xuegang xuegang commented Jan 27, 2015

Hi koichi;
In coordinator restore mode. I got coredump when I create table.

Recreate steps:
1.pg_ctl start -Z restoremode -D /rdbdata/bcrdb_data/coord
2.psql -hzhcx5i -p20015 cxdb
3.create table:

CREATE TABLE cm_busi_handle_201301 (
so_nbr bigint,
region_code integer,
process_id integer,
process_result integer,
handle_seq integer,
op_id integer,
oper_date timestamp without time zone,
oper_end_date timestamp without time zone,
invoice_no character varying(20),
property character varying(20),
oper_desc text
)
DISTRIBUTE BY MODULO (region_code)
TO NODE (datanode1,datanode2,datanode3,datanode4,datanode5,datanode6,datanode7,datanode8)

4.get coredump

cxdb=# CREATE TABLE cm_busi_handle_201301 (
cxdb(# so_nbr bigint,
cxdb(# region_code integer,
cxdb(# process_id integer,
cxdb(# process_result integer,
cxdb(# handle_seq integer,
cxdb(# op_id integer,
cxdb(# oper_date timestamp without time zone,
cxdb(# oper_end_date timestamp without time zone,
cxdb(# invoice_no character varying(20),
cxdb(# property character varying(20),
cxdb(# oper_desc text
cxdb(# )
cxdb-# DISTRIBUTE BY MODULO (region_code)
cxdb-# TO NODE (datanode1,datanode2,datanode3,datanode4,datanode5,datanode6,datanode7,datanode8);
The connection to the server was lost. Attempting reset: Failed.
!>

5.stack when get coredump

gdb /rdbdata/bcrdb_install/bin/postgres /tmp/corefile/core.postgres.48524
(gd bt
#0 0x00000036a48328a5 in raise () from /lib64/libc.so.6
#1 0x00000036a4834085 in abort () from /lib64/libc.so.6
#2 0x00000036a486fa37 in __libc_message () from /lib64/libc.so.6
#3 0x00000036a4875366 in malloc_printerr () from /lib64/libc.so.6
#4 0x00000036a4877e93 in _int_free () from /lib64/libc.so.6
#5 0x0000000000769879 in AllocSetDelete (context=) at aset.c:551
#6 0x0000000000769dad in MemoryContextDelete (context=0x12231e8) at mcxt.c:193
#7 0x000000000076aa70 in PortalDrop (portal=0x122d0c0, isTopCommit=) at portalmem.c:588
#8 0x000000000067ddaa in exec_simple_query (

query_string=0x114c1e0 "CREATE TABLE cm_busi_handle_201301 (\n    so_nbr bigint,\n    region_code integer,\n    process_id integer,\n    process_result integer,\n    handle_seq integer,\n    op_id integer,\n    oper_date timestamp "...) at postgres.c:1149

#9 0x000000000067f82f in PostgresMain (argc=, argv=, dbname=0x1166708 "cxdb",

username=<value optimized out>) at postgres.c:4243

#10 0x000000000063b84a in BackendRun (argc=, argv=) at postmaster.c:4202
#11 BackendStartup (argc=, argv=) at postmaster.c:3891
#12 ServerLoop (argc=, argv=) at postmaster.c:1702
#13 PostmasterMain (argc=, argv=) at postmaster.c:1369
#14 0x00000000005d1420 in main (argc=4, argv=0x1131c70) at main.c:206

My analysis is as below:
when I start coordinator in restoremode. Pooler process is not running. So NumDataNodes is zero.
so there is a problem in function BuildRelationDistributionNodes

{

//In restoremode. NumDataNodes is 0. So memory allocation has the problem. when the parameter of palloc0 is 0.Then palloc0 allocate smallest chunk to               

//nodeoids. If we need more memory, there is a memory overflow. so when postgres free memory, get coredump

  nodeoids = (Oid ) palloc0(NumDataNodes * sizeof(Oid)); 

}

The right code as follows:

BuildRelationDistributionNodes(List *nodes, int *numnodes)
{
Oid *nodeoids;
ListCell *item;
int numdatanotes;
*numnodes = 0;

numdatanotes=list_length(nodes);
nodeoids = (Oid _) palloc0(numdatanotes_sizeof(Oid));

}

I have put changed code in git repo, and pull request to you. Please help to review it. If and problem. Please let me know.

Thanks
[email protected]

@koichi-szk
Copy link
Member

Hello;

I'm very sorry that I've left this thread for a log time. I think this fix can be put into the master and related stable branches.

I'm also renewing whole PGXC project. At present, master repository is in sourceforge and at present, there are slight conflict between sourceforge and github. So I'm resetting github repository and push everything from sourceforge to make github as the master repository.

The patch will be kept in current sourceforge repository to be brought back here.

You may see the repository is gone and the be back as initialized status. Please understand that the patch is kept elsewhere to be back.

I'd like you to continue to contribute to this project.

Thank you.

Koichi Suzuki

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants