| 1 | #################################################################### |
|---|
| 2 | # TODO list for pvfs2 project as a whole |
|---|
| 3 | # |
|---|
| 4 | # |
|---|
| 5 | |
|---|
| 6 | NOTE: Some (dated) status information can be found in doc/pvfs2-status.tex |
|---|
| 7 | |
|---|
| 8 | improving robustness of I/O apis: |
|---|
| 9 | ==================================================================== |
|---|
| 10 | - our internal api's should be able to handle the following cases: |
|---|
| 11 | a) operations posted before initialize() should return error |
|---|
| 12 | b) operations posted after finalize() has started should return error |
|---|
| 13 | c) finalize() should gracefully terminate pending operations, although |
|---|
| 14 | those operations will have undefined results |
|---|
| 15 | - these API's in particular need update in that regard: |
|---|
| 16 | - dbpf-attr-cache DONE |
|---|
| 17 | - trove |
|---|
| 18 | - bmi |
|---|
| 19 | - flow |
|---|
| 20 | - job |
|---|
| 21 | - request scheduler |
|---|
| 22 | - device interface |
|---|
| 23 | |
|---|
| 24 | server operations: |
|---|
| 25 | ==================================================================== |
|---|
| 26 | - not started: |
|---|
| 27 | - eattrib (set/get) |
|---|
| 28 | |
|---|
| 29 | - unfinished: |
|---|
| 30 | - general error handling |
|---|
| 31 | - performance monitoring (need more metrics) |
|---|
| 32 | |
|---|
| 33 | general server functionality: |
|---|
| 34 | ==================================================================== |
|---|
| 35 | - attributes (permissions, etc.) on datafiles |
|---|
| 36 | - finishing file system semantics documentation |
|---|
| 37 | - don't forget to define semantics for access times |
|---|
| 38 | |
|---|
| 39 | request scheduler: |
|---|
| 40 | ==================================================================== |
|---|
| 41 | - more generic implementation |
|---|
| 42 | - smarter concurrency rules |
|---|
| 43 | |
|---|
| 44 | system interface functionality: |
|---|
| 45 | ==================================================================== |
|---|
| 46 | - not started: |
|---|
| 47 | - eattrib (set/get)? |
|---|
| 48 | |
|---|
| 49 | - unfinished: |
|---|
| 50 | - thread safety |
|---|
| 51 | - way to pass in consistency semantics (timeout values, etc.) |
|---|
| 52 | |
|---|
| 53 | - define how configuration info should be passed in |
|---|
| 54 | (how to do paths, fstab, url stuff, whatever) |
|---|
| 55 | - define how to pass in distribution and number of datafiles for |
|---|
| 56 | cases in which the caller wants to override the defaults |
|---|
| 57 | - add nonblocking api for some functions |
|---|
| 58 | - clean up API (in particular fstab parsing / initialize path, and removal of |
|---|
| 59 | depricated terminology) |
|---|
| 60 | - make input pointer argumentss to system interface be declared const |
|---|
| 61 | - make sure that system interface functions return an error, rather than |
|---|
| 62 | asserting, if the caller tries to operate on a bogus handle (one case occurs |
|---|
| 63 | in assertions following PINT_bucket_map_to_server()) |
|---|
| 64 | |
|---|
| 65 | kernel/vfs interface |
|---|
| 66 | ==================================================================== |
|---|
| 67 | |
|---|
| 68 | performance tuning: |
|---|
| 69 | ==================================================================== |
|---|
| 70 | - instrumenting |
|---|
| 71 | - steal what we can from mpich2 |
|---|
| 72 | - architecture specific locking, etc. |
|---|
| 73 | - thread tuning |
|---|
| 74 | - memory allocation cache |
|---|
| 75 | - do some benchmarking of thread context switches to help decide |
|---|
| 76 | how trove/job/flow interfaces should interact |
|---|
| 77 | - figure out how to make i/o faster |
|---|
| 78 | |
|---|
| 79 | request encoding: |
|---|
| 80 | ==================================================================== |
|---|
| 81 | - come up with a mechanism for handling requests that go beyond |
|---|
| 82 | the BMI defined limit for unexpected messages (mainly an issue |
|---|
| 83 | on read/write with complex datatypes, but also potentially a |
|---|
| 84 | problem on setattr) |
|---|
| 85 | |
|---|
| 86 | error codes: |
|---|
| 87 | ==================================================================== |
|---|
| 88 | - converting to new error code format (everywhere) |
|---|
| 89 | - documenting valid error codes from functions |
|---|
| 90 | |
|---|
| 91 | I/O path: |
|---|
| 92 | ==================================================================== |
|---|
| 93 | - buffer cache on top of trove |
|---|
| 94 | - clean up buffer management in BMI to be more useful for I/O buffer |
|---|
| 95 | cache, maybe push to a seperate component |
|---|
| 96 | - optimizing small reads and writes (packing data into req/ack messages) |
|---|
| 97 | - native GM flowprotocol |
|---|
| 98 | - general optimizations (lock granularity, immediate completion, etc.) |
|---|
| 99 | - ability to unpost, correct use of timeouts, preposting operations |
|---|
| 100 | - semantics of short read and write operations |
|---|
| 101 | - bmi_tcp scalability and robustness |
|---|
| 102 | - ability to toggle synch behavior in trove |
|---|
| 103 | - use better buffer size in default flow protocol |
|---|
| 104 | - bmi shmem implementation |
|---|
| 105 | - many items in BMI and flow TODO files |
|---|
| 106 | - ability to compile out device support, or at least prevent device thread |
|---|
| 107 | from spawning if not used |
|---|
| 108 | - ability to fail over with multiple bmi transports |
|---|
| 109 | |
|---|
| 110 | correctness/performance testing |
|---|
| 111 | ==================================================================== |
|---|
| 112 | - a comprehensive test suite of the system interface API |
|---|
| 113 | - more pts tests |
|---|
| 114 | - profiling code paths |
|---|
| 115 | - eliminate memory leaks |
|---|
| 116 | - handle server or client failures in a reasonable way (log and exit instead |
|---|
| 117 | of segfault, perhaps) |
|---|
| 118 | |
|---|
| 119 | system management utilities |
|---|
| 120 | ==================================================================== |
|---|
| 121 | - pvfs2-fsck (serial tool done, evolve into parallel tool) |
|---|
| 122 | - decide what we want/need here? |
|---|
| 123 | - health monitoring |
|---|
| 124 | - system recovery |
|---|
| 125 | - system statistics (raid stat, mem used, etc.) |
|---|
| 126 | - etc. |
|---|
| 127 | - performance monitoring: |
|---|
| 128 | - more metrics |
|---|
| 129 | - more viz tools |
|---|
| 130 | - end user documentation |
|---|
| 131 | - better logging systems |
|---|
| 132 | - maybe make pvfs2-ping compute a cksum on the fs.conf from all |
|---|
| 133 | servers and issue a warning if they don't all match? |
|---|
| 134 | |
|---|
| 135 | documentation: |
|---|
| 136 | ==================================================================== |
|---|
| 137 | - come up with an automated way to document the wire packet format |
|---|
| 138 | - also document headers that bmi tacks on, at least for bmi_tcp |
|---|
| 139 | - update the coding guidelines |
|---|
| 140 | - document config file options |
|---|
| 141 | - automate faq publishing |
|---|
| 142 | - mechanism for exporting to html |
|---|
| 143 | - update all design docs! |
|---|
| 144 | - review |
|---|
| 145 | |
|---|
| 146 | code cleanup: |
|---|
| 147 | ==================================================================== |
|---|
| 148 | - remove some of the stuff from the test subdir for "make dist" target |
|---|
| 149 | - in particular, test/common (partial), test/io, test/proto, test/server |
|---|
| 150 | - put in header file wrappers to make them work with c++ |
|---|
| 151 | - audit code to make sure that all error paths are handled when |
|---|
| 152 | assertions are turned off |
|---|
| 153 | - maybe make a checklist for each pvfs2 component to use as we clean |
|---|
| 154 | up each section of the code? (items to check for each component |
|---|
| 155 | could include stuff like symbol names, PVFS_error code usage, |
|---|
| 156 | properly error handling when assertions are off, etc.) |
|---|
| 157 | - consistent formatting |
|---|
| 158 | - consistent function naming |
|---|
| 159 | - consistent header file inclusion |
|---|
| 160 | - come up with more named values like TROVE_HANDLE_NULL to use in |
|---|
| 161 | other parts of the code |
|---|
| 162 | - try to clean up flow / I/O path some, in particular so we don't have |
|---|
| 163 | to do so much mallocing to set up from client side |
|---|
| 164 | - maybe do things like embed file_data struct in flow desc. |
|---|
| 165 | - make permission checking in prelude.sm neater, maybe assert on |
|---|
| 166 | unkown op types so we don't forget to add new ones here |
|---|
| 167 | |
|---|
| 168 | fault tolerance: |
|---|
| 169 | ===================================================================== |
|---|
| 170 | - what does the API look like |
|---|
| 171 | - data redundancy |
|---|
| 172 | - failover |
|---|
| 173 | |
|---|
| 174 | testing: |
|---|
| 175 | ===================================================================== |
|---|
| 176 | - run common test programs and benchmarks, like: |
|---|
| 177 | - flash |
|---|
| 178 | - iozone |
|---|
| 179 | - dbench |
|---|
| 180 | - ior |
|---|
| 181 | - bonnie |
|---|
| 182 | - make kernel |
|---|
| 183 | - mpiiotest |
|---|
| 184 | - John May's tests? |
|---|
| 185 | - piobench |
|---|
| 186 | - more pts tests |
|---|
| 187 | - more datatype testing |
|---|
| 188 | - remember example of ub < lb |
|---|
| 189 | |
|---|
| 190 | rob's random list: |
|---|
| 191 | ===================================================================== |
|---|
| 192 | - do something about the weird PINT_sys_wait and PINT_mgmt_wait macros in |
|---|
| 193 | client-state-machine.h |
|---|