Changeset 8313
- Timestamp:
- 04/29/10 18:09:46 (3 years ago)
- Location:
- trunk/src/io/bmi/bmi_zoid
- Files:
-
- 2 modified
Legend:
- Unmodified
- Added
- Removed
-
trunk/src/io/bmi/bmi_zoid/README
r8309 r8313 17 17 needed, but is probably already larger than necessary, 18 18 19 - FIXME ZBMI limits 19 - ZBMI_SHM_SIZE_TOTAL (in init.c in ZeptoOS 20 packages/zoid/src/zbmi/implementation/ directory): total size of the shared 21 memory buffer used to exchange bulk data between the ZOID daemon and the BMI 22 server; defaults to 512M. 23 24 - ZBMI_SHM_SIZE_UNEXP (in init.c in ZeptoOS 25 packages/zoid/src/zbmi/implementation/ directory): part of the shared 26 memory buffer used for unexpected messages; defaults to 1M. 20 27 21 28 … … 103 110 104 111 The zbmi plugin is mostly stateless so far as the compute node clients are 105 concerned. Specifically, the information on posted , but not immediately106 completed expected message sends/receives is stored exclusively on the 107 client side.112 concerned. Specifically, the information on posted expected message 113 sends/receives that were not immediately completed is stored exclusively on 114 the client side. 108 115 109 116 All BMI send routines end up in zoid_post_send_common. That includes 110 117 unexpected messages and list I/O. This routine attempts to forward the 111 118 message to the zbmi plugin on the I/O node, using zbmi_send. For 112 unexpected messages, zbmi_send is normally expected to succeed and result113 i n an immediate completion; however, if the zbmi plugin is out of memory,119 unexpected messages, zbmi_send should normally succeed and result in an 120 immediate completion; however, if the zbmi plugin is out of memory, 114 121 zbmi_send will fail with ENOMEM. The same failure will occur with expected 115 122 messages if a matching receive has not been posted on the I/O node side by … … 119 126 resulting in an immediate completion. zbmi_send normally does not block, 120 127 but if BMI_ZOID_POST_TIMEOUT has been enabled, it can, waiting for a 121 matching expected message post from the BMI server side. 128 matching expected message post from the BMI server side or for memory to be 129 released on the BMI server side so that an unexpected message can be 130 stored. 122 131 123 132 The way zbmi_send is forwarded by ZOID, the data payload is only … … 149 158 using zbmi_test. zbmi_test can block on the server for the specified time 150 159 if none of the specified requests is initially ready. zbmi_test returns 151 the number of ready requests; if it is non-zero, then zoid_test_common next 152 attempts to satisfy those requests by invoking zbmi_send/zbmi_recv. Those 153 send/recv routines could still fail in spite of a successful test, if there 154 is no memory, or if the server-side canceled its matching request; this is 160 the number of ready requests; if it is non-zero, then the server side must 161 have posted matching sends/receives, so zoid_test_common next attempts to 162 satisfy those "ready" requests by invoking zbmi_send/zbmi_recv. Those 163 send/recv routines could still fail in spite of a successful test if there 164 is no memory or if the server side canceled its matching request; this is 155 165 recoverable. 156 166 … … 190 200 retry the allocation after every BMI_memfree. 191 201 192 Expected server-side posts, be it sends or receives, are never completed 193 immediately: we send a message descriptor to the zbmi plugin which 194 registers it and just sends back a confirmation. When registering we 195 exchange the internal BMI id and the internal ZOID id, since that 196 simplifies subsequent testing/canceling. 202 Expected server-side posts, be it sends or receives, can be completed 203 immediately if a matching client-side post (or test) is waiting when 204 server-side post is issued. Note that "immediately" is used liberally 205 here; the server-side post will not return until the buffer has been 206 transferred to/from the compute node, which can take some time when the 207 ZOID server is under heavy load. For posts not completed immediately we 208 send a message descriptor to the zbmi plugin which registers the message 209 and just sends back a confirmation. When registering we exchange the 210 internal BMI id and the internal ZOID id, since that simplifies subsequent 211 testing/canceling. 197 212 198 213 Canceling messages is more complex than on the client side. Generally, we -
trunk/src/io/bmi/bmi_zoid/server.c
r8309 r8313 88 88 static bmi_method_addr_p get_client_addr(int zoid_addr); 89 89 static int enqueue_no_mem(method_op_p op, bmi_size_t total_size); 90 static int send_post_cmd(method_op_p op );90 static int send_post_cmd(method_op_p op, int not_immediate, int* length); 91 91 92 92 … … 272 272 free(desc); 273 273 274 if ((op->error_code = -send_post_cmd(op )))274 if ((op->error_code = -send_post_cmd(op, 1, NULL))) 275 275 { 276 276 gen_mutex_lock(&error_ops_mutex); … … 382 382 { 383 383 method_op_p new_op; 384 int ret; 384 385 385 386 /* Server-side sends are never immediate, so we start by allocating a … … 446 447 } 447 448 448 return send_post_cmd(new_op); 449 if ((ret = send_post_cmd(new_op, 0, NULL)) == 1) 450 { 451 /* Immediate completion. */ 452 if (buffer_type == BMI_EXT_ALLOC) 453 BMI_zoid_server_memfree(METHOD_DATA(new_op)->tmp_buffer); 454 455 bmi_dealloc_method_op(new_op); 456 } 457 458 return ret; 449 459 } 450 460 … … 460 470 { 461 471 method_op_p new_op; 472 int ret, length; 462 473 463 474 /* Server-side receives are never immediate, so we start by allocating a … … 516 527 } 517 528 518 return send_post_cmd(new_op); 529 if ((ret = send_post_cmd(new_op, 0, &length)) == 1) 530 { 531 /* Immediate completion. */ 532 *total_actual_size = length; 533 534 if (buffer_type == BMI_EXT_ALLOC) 535 { 536 /* Copy the memory back to the user buffer(s). */ 537 int j, size_remaining = length; 538 void *buf_cur = METHOD_DATA(new_op)->tmp_buffer; 539 j = 0; 540 while (size_remaining > 0) 541 { 542 int tocopy = (new_op->size_list[j] < size_remaining ? 543 new_op->size_list[j] : size_remaining); 544 545 memcpy(new_op->buffer_list[j], buf_cur, tocopy); 546 buf_cur += tocopy; 547 size_remaining -= tocopy; 548 j++; 549 } 550 551 BMI_zoid_server_memfree(METHOD_DATA(new_op)->tmp_buffer); 552 } 553 554 bmi_dealloc_method_op(new_op); 555 } 556 557 return ret; 519 558 } 520 559 … … 1034 1073 } 1035 1074 1036 /* A common internal posting routine for send and receive requests. */ 1075 /* A common internal posting routine for send and receive requests. 1076 "not_immediate" is used for messages triggered from memfree; it is 1077 inconvenient at that point for messages to succeed immediately (we 1078 would need a separate queue for them). 1079 The function returns 0 if posted successfully, 1 for immediate 1080 completion, and a negative value if failed. 1081 For immediate completions of receives, the length of the received 1082 message is stored in *length; 1083 */ 1037 1084 static int 1038 send_post_cmd(method_op_p op )1085 send_post_cmd(method_op_p op, int not_immediate, int* length) 1039 1086 { 1040 1087 mqd_t reply_queue; … … 1059 1106 ZBMI_CONTROL_POST_RECV); 1060 1107 cmd->queue_id = queue_id; 1108 cmd->not_immediate = not_immediate; 1061 1109 cmd->bmi_id = op->op_id; 1062 1110 cmd->buf.addr = ((struct zoid_addr*)op->addr->method_data)->pid; … … 1107 1155 return -BMI_ENOMEM; 1108 1156 } 1157 else if (resp.zoid_id == -1) 1158 { 1159 /* Immediate completion. */ 1160 assert(!not_immediate); 1161 gen_mutex_unlock(&METHOD_DATA(op)->post_mutex); 1162 if (length) 1163 *length = resp.length; 1164 return 1; 1165 } 1109 1166 1110 1167 METHOD_DATA(op)->zoid_buf_id = resp.zoid_id;
