Changeset 8313

Show
Ignore:
Timestamp:
04/29/10 18:09:46 (3 years ago)
Author:
iskra
Message:

Add support for immediate completion on the server side.

Location:
trunk/src/io/bmi/bmi_zoid
Files:
2 modified

Legend:

Unmodified
Added
Removed
  • trunk/src/io/bmi/bmi_zoid/README

    r8309 r8313  
    1717  needed, but is probably already larger than necessary, 
    1818 
    19 - FIXME ZBMI limits 
     19- ZBMI_SHM_SIZE_TOTAL (in init.c in ZeptoOS 
     20  packages/zoid/src/zbmi/implementation/ directory): total size of the shared 
     21  memory buffer used to exchange bulk data between the ZOID daemon and the BMI 
     22  server; defaults to 512M. 
     23 
     24- ZBMI_SHM_SIZE_UNEXP (in init.c in ZeptoOS 
     25  packages/zoid/src/zbmi/implementation/ directory): part of the shared 
     26  memory buffer used for unexpected messages; defaults to 1M. 
    2027 
    2128 
     
    103110 
    104111The zbmi plugin is mostly stateless so far as the compute node clients are 
    105 concerned.  Specifically, the information on posted, but not immediately 
    106 completed expected message sends/receives is stored exclusively on the 
    107 client side. 
     112concerned.  Specifically, the information on posted expected message 
     113sends/receives that were not immediately completed is stored exclusively on 
     114the client side. 
    108115 
    109116All BMI send routines end up in zoid_post_send_common.  That includes 
    110117unexpected messages and list I/O.  This routine attempts to forward the 
    111118message to the zbmi plugin on the I/O node, using zbmi_send.  For 
    112 unexpected messages, zbmi_send is normally expected to succeed and result 
    113 in an immediate completion; however, if the zbmi plugin is out of memory, 
     119unexpected messages, zbmi_send should normally succeed and result in an 
     120immediate completion; however, if the zbmi plugin is out of memory, 
    114121zbmi_send will fail with ENOMEM.  The same failure will occur with expected 
    115122messages if a matching receive has not been posted on the I/O node side by 
     
    119126resulting in an immediate completion.  zbmi_send normally does not block, 
    120127but if BMI_ZOID_POST_TIMEOUT has been enabled, it can, waiting for a 
    121 matching expected message post from the BMI server side. 
     128matching expected message post from the BMI server side or for memory to be 
     129released on the BMI server side so that an unexpected message can be 
     130stored. 
    122131 
    123132The way zbmi_send is forwarded by ZOID, the data payload is only 
     
    149158using zbmi_test.  zbmi_test can block on the server for the specified time 
    150159if none of the specified requests is initially ready.  zbmi_test returns 
    151 the number of ready requests; if it is non-zero, then zoid_test_common next 
    152 attempts to satisfy those requests by invoking zbmi_send/zbmi_recv.  Those 
    153 send/recv routines could still fail in spite of a successful test, if there 
    154 is no memory, or if the server-side canceled its matching request; this is 
     160the number of ready requests; if it is non-zero, then the server side must 
     161have posted matching sends/receives, so zoid_test_common next attempts to 
     162satisfy those "ready" requests by invoking zbmi_send/zbmi_recv.  Those 
     163send/recv routines could still fail in spite of a successful test if there 
     164is no memory or if the server side canceled its matching request; this is 
    155165recoverable. 
    156166 
     
    190200retry the allocation after every BMI_memfree. 
    191201 
    192 Expected server-side posts, be it sends or receives, are never completed 
    193 immediately: we send a message descriptor to the zbmi plugin which 
    194 registers it and just sends back a confirmation.  When registering we 
    195 exchange the internal BMI id and the internal ZOID id, since that 
    196 simplifies subsequent testing/canceling. 
     202Expected server-side posts, be it sends or receives, can be completed 
     203immediately if a matching client-side post (or test) is waiting when 
     204server-side post is issued.  Note that "immediately" is used liberally 
     205here; the server-side post will not return until the buffer has been 
     206transferred to/from the compute node, which can take some time when the 
     207ZOID server is under heavy load.  For posts not completed immediately we 
     208send a message descriptor to the zbmi plugin which registers the message 
     209and just sends back a confirmation.  When registering we exchange the 
     210internal BMI id and the internal ZOID id, since that simplifies subsequent 
     211testing/canceling. 
    197212 
    198213Canceling messages is more complex than on the client side.  Generally, we 
  • trunk/src/io/bmi/bmi_zoid/server.c

    r8309 r8313  
    8888static bmi_method_addr_p get_client_addr(int zoid_addr); 
    8989static int enqueue_no_mem(method_op_p op, bmi_size_t total_size); 
    90 static int send_post_cmd(method_op_p op); 
     90static int send_post_cmd(method_op_p op, int not_immediate, int* length); 
    9191 
    9292 
     
    272272            free(desc); 
    273273 
    274             if ((op->error_code = -send_post_cmd(op))) 
     274            if ((op->error_code = -send_post_cmd(op, 1, NULL))) 
    275275            { 
    276276                gen_mutex_lock(&error_ops_mutex); 
     
    382382{ 
    383383    method_op_p new_op; 
     384    int ret; 
    384385 
    385386    /* Server-side sends are never immediate, so we start by allocating a 
     
    446447    } 
    447448 
    448     return send_post_cmd(new_op); 
     449    if ((ret = send_post_cmd(new_op, 0, NULL)) == 1) 
     450    { 
     451        /* Immediate completion.  */ 
     452        if (buffer_type == BMI_EXT_ALLOC) 
     453            BMI_zoid_server_memfree(METHOD_DATA(new_op)->tmp_buffer); 
     454 
     455        bmi_dealloc_method_op(new_op); 
     456    } 
     457 
     458    return ret; 
    449459} 
    450460 
     
    460470{ 
    461471    method_op_p new_op; 
     472    int ret, length; 
    462473 
    463474    /* Server-side receives are never immediate, so we start by allocating a 
     
    516527    } 
    517528 
    518     return send_post_cmd(new_op); 
     529    if ((ret = send_post_cmd(new_op, 0, &length)) == 1) 
     530    { 
     531        /* Immediate completion.  */ 
     532        *total_actual_size = length; 
     533 
     534        if (buffer_type == BMI_EXT_ALLOC) 
     535        { 
     536            /* Copy the memory back to the user buffer(s).  */ 
     537            int j, size_remaining = length; 
     538            void *buf_cur = METHOD_DATA(new_op)->tmp_buffer; 
     539            j = 0; 
     540            while (size_remaining > 0) 
     541            { 
     542                int tocopy = (new_op->size_list[j] < size_remaining ? 
     543                              new_op->size_list[j] : size_remaining); 
     544 
     545                memcpy(new_op->buffer_list[j], buf_cur, tocopy); 
     546                buf_cur += tocopy; 
     547                size_remaining -= tocopy; 
     548                j++; 
     549            } 
     550 
     551            BMI_zoid_server_memfree(METHOD_DATA(new_op)->tmp_buffer); 
     552        } 
     553 
     554        bmi_dealloc_method_op(new_op); 
     555    } 
     556 
     557    return ret; 
    519558} 
    520559 
     
    10341073} 
    10351074 
    1036 /* A common internal posting routine for send and receive requests.  */ 
     1075/* A common internal posting routine for send and receive requests. 
     1076   "not_immediate" is used for messages triggered from memfree; it is 
     1077   inconvenient at that point for messages to succeed immediately (we 
     1078   would need a separate queue for them). 
     1079   The function returns 0 if posted successfully, 1 for immediate 
     1080   completion, and a negative value if failed. 
     1081   For immediate completions of receives, the length of the received 
     1082   message is stored in *length; 
     1083*/ 
    10371084static int 
    1038 send_post_cmd(method_op_p op) 
     1085send_post_cmd(method_op_p op, int not_immediate, int* length) 
    10391086{ 
    10401087    mqd_t reply_queue; 
     
    10591106           ZBMI_CONTROL_POST_RECV); 
    10601107    cmd->queue_id = queue_id; 
     1108    cmd->not_immediate = not_immediate; 
    10611109    cmd->bmi_id = op->op_id; 
    10621110    cmd->buf.addr = ((struct zoid_addr*)op->addr->method_data)->pid; 
     
    11071155        return -BMI_ENOMEM; 
    11081156    } 
     1157    else if (resp.zoid_id == -1) 
     1158    { 
     1159        /* Immediate completion.  */ 
     1160        assert(!not_immediate); 
     1161        gen_mutex_unlock(&METHOD_DATA(op)->post_mutex); 
     1162        if (length) 
     1163            *length = resp.length; 
     1164        return 1; 
     1165    } 
    11091166 
    11101167    METHOD_DATA(op)->zoid_buf_id = resp.zoid_id;