Ticket #41 (closed defect: fixed)

Opened 22 months ago

Last modified 21 months ago

pvfs2-server segfaults when num_dfiles larger than number of servers

Reported by: mtmoore Owned by: ligon
Priority: major Milestone:
Component: Server Version: latest
Keywords: Cc:

Description

In a single server setup if the number of datafiles is set on a directory using the extended attribute user.pvfs2.num_dfiles to 2, the server segfaults when the copying a file into the file system (via pvfs2-cp).

[E 08/16/2011 23:01:55] PVFS2 server: signal 6
[E 08/16/2011 23:01:55] [bt] /lib64/libc.so.6(gsignal+0x35) [0x3c2e830265]
[E 08/16/2011 23:01:55] [bt] /lib64/libc.so.6(gsignal+0x35) [0x3c2e830265]
[E 08/16/2011 23:01:55] [bt] /lib64/libc.so.6(abort+0x110) [0x3c2e831d10]
[E 08/16/2011 23:01:55] [bt] /lib64/libc.so.6(assert_fail+0xf6) [0x3c2e8296e6]
[E 08/16/2011 23:01:55] [bt] install/cvs-orange-head/sbin/pvfs2-server [0x424bac]
[E 08/16/2011 23:01:55] [bt] /home/mtmoore/pvfs/install/cvs-orange-head/sbin/pvfs2-server(job_bmi_unexp_cancel+0x21) [0x420ef1]
[E 08/16/2011 23:01:55] [bt] install/cvs-orange-head/sbin/pvfs2-server [0x414416]
[E 08/16/2011 23:01:55] [bt] /lib64/libpthread.so.0 [0x3c2f40eb10]
[E 08/16/2011 23:01:55] [bt] /lib64/libc.so.6(
write+0x4b) [0x3c2e8c680b]
[E 08/16/2011 23:01:55] [bt] /lib64/libc.so.6(_IO_file_write+0x43) [0x3c2e86bc03]
[E 08/16/2011 23:01:55] [bt] /lib64/libc.so.6(_IO_do_write+0x76) [0x3c2e86bb16]
[E 08/16/2011 23:01:55] [bt] /lib64/libc.so.6(_IO_file_sync+0xa7) [0x3c2e86c147]
[E 08/16/2011 23:01:55] [bt] /lib64/libc.so.6(fflush+0x20) [0x3c2e8612f0]
[E 08/16/2011 23:01:55] [bt] install/cvs-orange-head/sbin/pvfs2-server [0x44f675]
[E 08/16/2011 23:01:55] [bt] install/cvs-orange-head/sbin/pvfs2-server(gossip_debug+0x85) [0x44fbf5]

(gdb) l *(0x44f675)
0x44f675 is in tree_get_file_size_comp_fn (../../src/cvs-orange-head/pvfs2/./src/server/tree-communicate.sm:546).

Change History

Changed 21 months ago by ligon

  • owner changed from parl to ligon
  • status changed from new to assigned

Changed 21 months ago by ligon

All:

I have corrected the server seg fault in both orangefs-2-8-5 and
Orange-Branch. The problem occurred when a pvfs2-cp was issued and the
server side "unstuff" machine tried to create additional handles in a
one-server environment. The seg fault reared its ugly head when trying to
get handles from pools that were never created. In a one-server
environment, you would expect all handles to be local handles, so trying
to get handles from a pool doesn't make sense at all. However, the real
problem was that additional handles should never have been requested in
the first place!

In this particular situation, the pvfs2-cp was issued against a parent
directory defined with user.pvfs2.num_dfiles=3. So, sys-create on the
client side passed that value to the create machine on the server side,
which stored that value as part of the "stuffed" metadata information for
the new file. Then, the "sys-io" machine called "unstuff",which tried to
create the 2 additional handles, based on the "stuffed" metadata of 3.

To prevent sending a request to the server asking for more handles than
there are I/O servers, I modified the function
PINT_cached_config_get_num_dfiles to cap the returned number of dfiles at
the number of I/O servers and issue a gossip_err() whenever this happens.

For example, in the situation above, sys-create will only ask for 1 handle
now, so only 1 is stored as the "unstuff" metadata value. When "sys-io"
calls "unstuff", the requested number of handles will be 1, which prevents
the "unstuff" machine from providing more handles. (Before optimization
was put in. see below).

If "file stuffing" is turned off in the config file, the number of
datafiles created will never be greater than the number of I/O servers.
This functionality was already built into the system but was an oversight
when file stuffing was created.

I also added an optimization. In the server side create, if file stuffing
is turned on but the number of dfiles requested is 1, then "stuffed"
metadata values are NOT stored. Thus, the client side IO machine will NOT
send an "unstuff" request to the server unnecessarily.

Changed 21 months ago by ligon

  • status changed from assigned to closed
  • resolution set to fixed
Note: See TracTickets for help on using tickets.