Version 5 (modified by walt, 2 years ago)

--

C Programming

Formatting

This section will provide guidelines so that multiple users on a given project can write code with consistent appearance. This makes the code easier to maintain and audit in a group environment.

GPL

{anchor:gpl}

The GPL, or General Public License, is a software license created by the GNU organization ( http://www.gnu.org). You can find out more information about it at their web site. A brief summary is that the GPL insists that the source code be distributed with any software released under the GPL. Furthermore, anyone who modifies and redistributes GPL code must release their modifications under the GPL as well. This is convenient for the research community because it encourages the sharing of ideas and also legally protects your developments.

Any PARL project code that you release to the community should include an electronic copy of the GPL. The easiest way to do this is to create a file in the project's top level directory called "COPYING" which contains the full text of the GPL version 2 as obtained from  http://www.gnu.org. Then in every source code file in the project, include the following text at the very top of the file:

/*
 * (C) 2001 Clemson University.
 *
 * See COPYING in top-level directory.
 */

Other organizations that have contributed to the project may be listed in the copyright line as well (See section \ref{sec:pvfs-copyright} for information on how do this in PVFS code). If you wish to credit particular developers or provide contact information, please do so in the README file located in the top level directory.

Other source code header information

In addition to the copyright comments, it is usually helpful to provide a brief description of what is contained in the each source file. This should just be a few summary lines below the copyright information.

Commenting

\label{sec:comments}

Commenting your code effectively is very important! Please comment important sections of your code clearly and concisely as you write it. The habit of commenting after completing the code often leads to poor comments.

Do not use c++ style comment delimiters ( // ) in c code. Some c compilers do not accept this as a comment delimiter, and it is not a part of the c language specification.

For single line comments (or brief comments trailing a line of code), just use the /* and */ delimiters. If the comment is longer than one line, use this format:

/* This code does lots of cool things.  It is also written perfectly and
 * will never break.  It is fast, robust, extensible, and resistant to
 * rust and corrosion.
 */

This makes it easy to tell where the comment begins and ends.

Comments that describe the operation of a particular function should be listed just above the function definition, not the prototype. The comment should give the function name, what it does, what any potential side effects are, and the range of return values. This is one example:

/** MC_finalize()
 *
 * This function shuts down the method control subsystem.  It is
 * responsible for tearing down internal data structures, shutting down
 * individual method devices, and gracefully removing any unfinished
 * operations.
 *
 * returns 0 on success, -errno on failure
 */
int MC_finalize(void)
{
   ...
}

If you are working on the PVFS project, then you should adhere to the function comments described in section \ref{sec:pvfs-comments}.

Brackets

Brackets are of course used to delineate blocks of code contained within loops, conditional statements, or functions. For clarity, any statement executed within a conditional or loop should be enclosed in brackets, even if it is just one line. For example:

if(something true)
{
   do something;
}

and not

if(something true)
   do something;

Also note that each bracket gets it's own line in the source code.

Indentation

\label{sec:indent} Indentation is also very important to writing clear code. The easiest rule to remember is that any new set of brackets should add a level of indentation for the code contained within it. This holds for functions, loops, and conditionals. The following is an example:

int foofunction(int x)
{

   int y = 0;

   if(x <= 0)
   {
      do some stuff;
   }
   else
   {
      for(y=0; y<x; y++)
      {
         do lots of stuff;
      }
   }

   return(0);
}

Hints for writing maintainable code

General code layout

\label{sec:proto}

These are a few general guidelines for how to organize your code:

  • Group similar functions together into the same .c file.
  • If a function will only be called from within the .c file where it is defined, then include the prototype for the function in the same .c file near the top. (see section \ref{sec:static} for information on static declarations)
  • If a function will be called from outside of the .c file in which it is declared, then put the prototype in a header file separate from the .c file. This header should be included in any other .c file where the function will be called.
  • Put comments describing the behavior of the function just before its definition, not with the prototype (see section #commenting for more detailed information about commenting functions).
  • Header files should only contain prototypes and structures that are needed by external pieces of code. It helps to encapsulate things by not providing extraneous information in the header files.

Length of functions

Try not to make extremely long functions. A good rule of thumb is to limit your functions to 100 lines or less. If a function is longer than this, then it should probably be broken apart into smaller subfunctions. Exceptions to this rule are rare.

Preventing double inclusion

If you are using a header file in several locations, it is easy to create a situation in which the same header file is indirectly included twice in a single compilation. This causes compilation errors because of function, variable, or type redefinition. In order to ensure that this does not happen, you should always wrap your header files in preprocessor macros that prevent the code from being read more than once by the compiler. This may be done by creating a special define that can be detected the second time the code is included. The name of this define should stand out so that it does not conflict with other variables or definitions in your code. It is usually safe to pick the filename of header, convert it to all uppercase, and replace punctuation with underscores. Here is an example for a header file called bmi.h:

/*
 * (C) 2001 Clemson University and The University of Chicago
 *
 * See COPYING in top-level directory.
 */

/** \file This file contains the primary application interface to the BMI
 *  library.
 */

#ifndef __BMI_H    /* these macros tell the compiler to skip the */
#define __BMI_H    /* following code if it hits it a second time */

/* now do whatever you would normally do in your header: */

#include<bmi_types.h>

struct foo{
   int x;
   int y;
};

int foo_function(double a, double b);

/* don't forget to end your header with this statement */
#endif /* __BMI_H */

Static declarations

\label{sec:static} Any function or variable that is declared global in a particular .c file but not referenced in any other .c file should be declared static. This helps to keep the symbol name space from becoming cluttered. It also insures that local functions are not accidentally called somewhere that they were not intended to be called. ====Initializing variables}

Initialize all variables when they are declared in your software. Even if it is a trivial scalar variable, go ahead and initialize it. Integers and floats can typically be initialized to -1 or 0, while pointers can be initialized to NULL. This simple habit can help uncover many problems that occur when the validity of a value is not checked before it is used. There is no guarantee what the value of a variable will be when it is created. Picking a known initial value to start out with can prevent garbage data from being interpreted as valid information.

A similar argument applies to memory regions that are dynamically allocated. Any dynamically allocated structure or variable should at least be zeroed out before being used in the code. This can be done with the memset() function:

foopointer = (struct foostruct)malloc(sizeof(struct foostruct));
if(foopointer == NULL)
{
   /* alloc failed */
   return(some error value);
}
memset(foopointer, 0, sizeof(struct foostruct));

If there are sentinal values other than 0 for elements contained in your struct, they should be set as well.

Allocating and deallocating complex structures

If there is a particular structure that you are frequently dynamically allocating or deallocating, it usually pays off to go ahead and create functions to handle those operations. This is especially helpful if there are further dynamically allocated structures within the original structure. Encapsulating all of this memory management in a pair of functions aids in debugging and makes your code more readable overall. A good naming convention is:

/* returns a pointer to new structure on success, null on failure */
struct foo* alloc_foo(void);

and

/* no return value */
void dealloc_foo(struct foo*);

Keeping up with work in progress

There are often questionable issues, or even issues that you don't have time to deal with at the moment, that come up when writing large pieces of code. It is generally helpful to document these questions or "todo" items in a known location so that they are not forgotten. There are two recommended ways of handling this. Keep larger or more imporant items listed in a file called "TODO" in the top level directory of your project. This file can be added to CVS so that other developers can see a quick list of known bugs or issues that need resolution. As items on this list are corrected, you may wish to log them in another file at the top level called "Changelog". Smaller issues, that are perhaps only important from a stylistic point of view, can be commented in the code and marked with the text string "TODO" within the comment. This is highlighted with a special color with vi syntax highlighting, and can easily be found with the grep tool later.

Choosing good variable and function names

Try to pick descriptive names for variables and functions, rather than saving keystrokes by picking obtuse abbreviations. This makes it easier for people who look at your code afterwards to understand what is going on. If your function or variable name is comprised of more than one word, then separate the word with underscores. If a collection of functions are related, or collectively form a common interface, the prepend an identifier to each function so that it is obvious that they belong together:

int test_control_open();
int test_control_close();
int test_control_read();

Function and variable nameing issues specific to PVFS can be found in section \ref{sec:pvfs-naming}.

Advanced topics

==== Checking for interrupted system calls====

If a system call fails, always check the return value to see if it was set to EINTR. If this happens, it means that the system call was interrupted by a signal and probably did not actually fail; it just needs to be restarted. This is a fairly common situation when doing reads, writes, or polls. You can restart operations either by wrapping them in a while loop that causes it to try again if EINTR occurs, or you can use a goto and a label to jump back to the the system call you wish to repeat.

Constant arguments

If you are passing in pointers as arguments to a function, but do not wish for the value contained in the pointer to be modified, then it is sometimes helpful to make the argument declaration a constant. This makes the compiler present a warning or an error if the value is accidentally modified within your function. This technique is especially useful when one of the arguments to your function is a string. In this case, you will probably be passing in a char* argument for convenience. However, passing in a string in this manner allows the function to modify the argument, which may not be desirable. Using a const char* argument can prevent this. Example:

int string_key(const char *key, const char *id_string)
{
   /* within this function it is now impossible to accidentally modify 
    * the character strings pointed to by key or id_string
    */
   return(0);
}

Obscure coding practices

By all means, try to avoid the use of obscure coding tricks when writing software as part of a group. This especially true when there is there is an equally valid but much clearer method of accomplishing your goal. Obscure coding practices include but are not limited to:

  • the : ? conditional operator
  • uneccessary goto statements
  • nested switches
  • implicit type conversion
  • placing too much emphasis on making code small

Locking data structures

If you are programming in a multithreaded or reentrant environment, it is very important to use locking mechanisms effectively. Any global variable should be locked before it is accessed in this type of environment. The pthread library contains almost any sort of portable primitive you may need for a single application. It is also helpful to wrap these calls behind an interface that allows you to turn locking on or off at compile time. The ability to disable locking can be useful during development or when running code on a system that does not require locking. Look in the pvfs-locks CVS module for an example.

Select vs. poll

Try to avoid using the select system call and use poll in its place. Poll scales more efficiently. It is also the most direct function call for accomplishing the desired task on modern Linux kernels because select is implemented on top of the kernel's poll function.

String parsing

Be careful with regards to which functions you use when doing simple string parsing. Some of the functions provided in {{string.h}}} are dangerous to use, either because they do not return error values, or because they alter their arguments. Most of these issues are documented in the man pages. One common example occurs when an integer value must be read out of a string. In this case, it is better to use sscanf than atoi:

   char number_string[] = "300";
   int my_number = -1;
   ret = -1;

   /* if you use sscanf, you can check the return value */
   ret = sscanf(number_string, "%d", &my_number);
   if(ret < 1)
   {
      return an error;
   }
   /* as opposed to atoi, which will not tell you if it fails */
   my_number = atoi(number_string);

Abstraction

When you are designing new interfaces, think carefully about how to create an abstraction for what you want the interface to do. The important idea here is to not be tied down to a particular implementation below your interface because you made the interface too restrictive. For example, suppose that you wish to create an interface for storing and retrieving a large number of independent objects. One way to implement this may be to use a hashing function. However, most people consider it to be much quicker to get a simple linked list working. If you abstract the interface correctly, you can implement the functionality with a linked list for now just to get your program working and then come back later and plug in a hash table implementation. This is only possible with a good abstraction, however. If your first interface has functions such as "add_to_head" or "create_new_list" that pass around pointers to lists, then it will of course be difficult to change this interface to use a hash table. It would be better to use functions such as "store_item" or "create_new_container" and use opaque types to keep up with your data structure.

Function pointers

Function pointers can be useful when creating modular code. They allow you to pick which function will be used to perform a given task at run time rather than compile time. This is not really any harder than manipulating pointers to variables:

/* this is the first way to send a message */
int send_message_one(void* data, int size);

/* this is the second way to send a message */
int send_message_two(void* data, int size);

/* this is a pointer to the preferred method */
int (*send_message_generic)(void*, int) = NULL;

...

if(something is true)
{
   send_message_generic = send_message_one;
}
else
{
   send_message_generic = send_message_two;
}


/* We don't care which method the user chose.  We know that it can be
 * accessed through this function pointer without us modifying our code.
 */
send_message_generic(my_data, sizeof(my_data));

Typedefs and opaque types

Choosing appropriate types for objects passed around in your code can be very important in some situations. There are a couple of different issues here:

  • Platform dependence: Different architectures use a different number of bytes for some variable types. This means that it can sometimes be very helpful to explicitly choose the size of some variables to aid portability. This is especially true if the data is going to passed over a network, although there are more issues (such as big-endian vs. little-endian) to worry about in those situations. It is often a good idea to use typedefs to create new type names that have a known, fixed size:
   typedef int32_t pvfs_flag_t;

This guarantees that when a pvfs_flag_t variable is declared, it will be a 32 bit integer, regardless of the host architecture.

  • Opaque types: Sometimes you wish to have an interface operate in terms of a specific type. If you are not certain of what type should be used for this purpose in the long term, you can hide it behind a typedef'd opaque type. That way, if you change the type later, you may not have to change every reference to it in the code. You just have to change the initial typedef statement. This can be done for structs or scalar types.

Guess I need an example here...

UP