Ticket #29 (closed defect: fixed)

Opened 2 years ago

Last modified 2 years ago

PVFS Admin apps and client-core fail in a configuration where ib and tcp are both used.

Reported by: ligon Owned by: ligon
Priority: major Milestone:
Component: Admin apps Version: latest
Keywords: Cc:

Description (last modified by ligon) (diff)

With a configuration using both IB and TCP bmi modules, all of the admin apps work until they issue the finalize step where a double-free occurs. The client core will hang as well. This is true for both admin apps and the client core if you have BOTH ib and tcp definitions in the pvfs2tab file.
Example:

Config file has the following entry:

  <Aliases>
   Alias devorange21 ib://devorange21ib:3335,tcp://devorange21:3334
  </Aliases>

Tab file has the following:

    ib://devorange21ib:3335/pvfs2-fs /pvfs2-ib  pvfs2 defaults 0 0
    tcp://devorange21:3334/pvfs2-fs  /pvfs2-tcp pvfs2 defaults 0 0


Change History

Changed 2 years ago by ligon

  • owner changed from parl to ligon
  • status changed from new to accepted

Changed 2 years ago by ligon

  • status changed from accepted to assigned

Changed 2 years ago by ligon

  • description modified (diff)

Changed 2 years ago by ligon

  • description modified (diff)

Changed 2 years ago by ligon

  • description modified (diff)

Changed 2 years ago by ligon

  • description modified (diff)

Changed 2 years ago by ligon

  • description modified (diff)

Changed 2 years ago by ligon

As it turns out, the number of entries in the pvfs2tab file doesn't matter. The problem is that the IB interface returns a pointer to a map instead of a copy of the map, when BMI_ib_method_addr_lookup is called. So, the client's cur_ref_list contained two entries, one for  ib://devorange21ib:3335 and one for  ib://devorange21ib:3335,tcp://devorange21:3334. The cur_ref_list saw this as two separate entries; however, both referred to the same IB map. Upon cleaning up the entries in the cur_ref_list, we now check a reference counter and only free up memory when the count is zero.

The MX interface has the same issue. All other protocols generate copies instead of pointers. I am going to leave this trac ticket open until I can also test and resolve the MX issue.

Changed 2 years ago by ligon

The MX interface does NOT have the same issue as the IB interface; it creates a copy of the mx map instead of passing pointers around. However, using mx with a second protocol did fail. BMX_parse_peername was expecting the alias string to *only* contain the mx server, so in a two-protocol situation, the alias string didn't pass the edits. To correct the problem, I first parsed away anything but the mx alias from the string, using the key_string function already defined in the code. This function also strips the " mx://" from the string as well, so modifications were made to the edits to allow for this.

In testing, I found that the mx protocol, by itself or with a second protocol, didn't pass the nightly tests. I will open another ticket to cover this problem.

Changed 2 years ago by ligon

  • status changed from assigned to closed
  • resolution set to fixed
Note: See TracTickets for help on using tickets.