The C Language (416 Abridged)


Title The C Language (416 Abridged)
Author Daniel Kim
Date Jan. 28, 2021

Introduction

This is an abridged C language book (Can we really call it a book when it's in a markdown file?), focusing on aspects of C that are used more often in CS416.

While the K&R C Language book is great (think CLRS in algorithms but for C), it's more than 155 pages for a language with similar syntax to Java for basic things and doesn't really give tips and tricks on useful techniques that could make your life a lot easier.

It's still recommended that you go through the K&R book if you have time since it goes into much more detail.

Contents

The Basics

We'll be going through this fairly quickly since most of these are things you should already know or will be familiar with from other languages.

The main() Function


int main(int argc, char** argv){
    return 0;
}

No more public static void or String[] args. Just return type, main, and parameters for command line arguments.

argc gives you the number of command line arguments passed through when running your program (For example: $./test.o arg1 arg2 will have argc=3).

argv is your list of arguments.

argc will always be at least 1. Why? The name of the program (in our case it's test.o) is considered an argument. Subsequently, it's also always the first argument in argv.

Remember to return something every time. This isn't a void function since the return value from main is used to determine error codes of your program on exit. Anything that isn't zero is considered an error.

Including Other Packages


#include <package.h>
#include "some/path/to/package.h"

int main(...){}

Two ways of including packages:

  1. The first is any installed packages/libraries on the machine which are found in your "search path" (probably /usr/include/ on the iLabs).
  2. The second way is to include any packages that aren't in your search path - you have to give the path to the header file (can be relative or full path).

For the projects in this class, you most likely won't need to worry about figuring out which packages to include since we already give them to you. You will need to include any header files that you make yourself though.

Printf


#include<stdio.h>

int main(...){

    printf("<format>", arg1, arg2, ...);
    return 0;
}

Yep, you need to include a package to use printf(). This should be simple enough - you provide a format and for every format identifier, you need to provide an argument that corresponds to it (ie: printf("%d\n", 5); %d is the identifier for ints, 5 being arg1).

Types


Some standard primitives you'll be using in this class:

  • int
  • char
  • float
  • long
  • unsigned int (or long, double, float, char, etc.)

Some types you might see popup from stdint.h:

  • int32_t (32-bit int)
  • uint32_t (unsigned 32-bit int)

Why do we need these other types? Sometimes a machine might not conform the same standards. An integer on the iLabs is 4-bytes, but that's not the case for every machine out there (honestly most machines use the same standard, but better safe than sorry).

Anything With Conditionals


We'll be going through these quickly since these are straightforward

//if-else statement
if(conditional){}
else if(conditional){}
else{}

//all the loops
do{}
while(conditional);

while(conditional){}

int i; //The default standard on GCC (gnu17? iLabs might have it set to something else) might make it so you can now declare inside the loop
for(i=0; i < x; i++){} 

conditional here is a boolean statement.

Some Useful Syntax Using Conditional Parameters


Something cool you can do is initialize values inside a conditional parameter and use it as part of your conditional. This might save you some lines.

int x;
if((x = foo()) == 0){
    ...
}

while((x = bar()) < MAX){
    ...
}

Operators


Nothing out of the ordinary here. These are the same as Java. We'll get into bitwise operators later.

Functions


We'll only be going over function headers here and save the rest for pointers

int foo(){
    return 0;
}

void bar(){
	return;
}

There's no public, private, protected, etc. Those concepts come from the OOP in Java. Just a return type, your function name, and parameters.

Just as a word of warning, sometimes the compiler let's you compile a program with functions that don't return anything despite having a return type. Using -Wall argument at compile time(with gcc) should stop this, but this isn't a required flag (and sometimes gets annoying).

Function Prototypes


USE THESE - not enough students use function protoypes - though for the most part the projects will force you to use them one way or another (because header files).

Function prototypes are useful since they let you globally reference functions, regardless of what order you write them in.

Why does this matter? Well, unlike Java, C doesn't let you write functions in any order and use them.

int main(...){

    foo(5, 'f');  //The compiler will complain that foo() doesn't exist or there's an implicit declaration of a function.
    return 0;
}

void foo(int a, char b){}

So how do we fix this? Write a function prototype before all of your functions. We'll get into how to clean this up later in header files for when you have too many function prototypes.

void foo(int, char); //you don't have to put the names of the parameters, just the types

int main(...){
    ...
}

void foo(...){}

You could also just write every function above the one you call it in, but function prototypes have other uses (again we'll expand on this in header files).

The Essentials

The next few sections focus on the more important aspects of C that we'll be using a lot in this class (and in general if you're a C programmer).

Structs


NOT objects. They aren't all that similar either. Structs don't belong to a class and they also don't hold any functions, which is key in objects. Structs are, however, abstractions of a group of things.

Structs hold "members" which are the data types that are valid to store in the struct itself.

struct node{
    int data;
    struct node *next;
}; //don't forget this semi-colon

The members of the struct (data and next) are the only valid data that you can put into this struct node. In memory, the size of a struct is the sum of sizes of its members.

Struct initializing and Referencing struct members (without pointers)

int main(...){

    struct node head = {.data = 4, .next = NULL}; //initializing like this might depend on the standard you're using
    //This is a more explicit way to declare and initialize
    struct node x;
    x.data = 5;
    x.next = NULL;

    //come back to this after reading pointers,
    //dynamic allocation and sizeof
    //Ignore for now.
    struct node *ptr = (struct node*)malloc(sizeof(struct node));
    ptr->data = 5;
    ptr->next = &head;


    int val = head.data; //referencing a member
    int ptr_next_val = ptr->next.data; //==val

    return 0;
}

It's important to note that structs that are declared but not initialized might not always be zeroed out. They can contain garbage values so you'll want to initialize structs every time to ensure correctness - though this isn't completely necessary.

You can also make locally defined structs. Just do the same syntax inside a function. These would be unavailable for the world outside this function.

Enums and Unions


This is condensed into one section since you won't be using these all that much, but you should know them anyway because they still have their uses outside of the class.

Enums

Enums are just ways to give names to numbers so that your program is easier to read and maintain. In general, giving a name to a value is useful, especially if you use that value a lot for the same things (MACROS).

There's a few ways to declare these but here's the simplest one.

//classic example
enum months{
    JAN; //note this starts from 0, not 1 unless you specify each enumeration
    FEB;
    ... //etc.
}

Unions

Unions are a special type of structure. The difference between a union and a struct is the size. The size of a union is the size of its largest member. Unions are usually used for statuses or flags, but really unions are generally good for space saving with anything that requires a large number of variables but don't need to use any two at the same time.

union flags{
    int flag1;
    double flag2;
}

The size of the union in this case is 8 bytes (size of a double). If you populate flag1 with a value, you will only be using the lower 4 bytes of the union since an int is only 4 bytes. Be careful, if you populate flag2 afterwards, you will lose the value of flag1. Similarly, if you populate flag2 first, and then populate flag1, you will change only the lower 4 bytes, so if you access flag2 again, you'll get a different (or corrupt) value.

Pointers


A bit different from Java references. You can do a lot more with these. In other words, this is how you accidentally shoot yourself in the foot if you aren't careful. Pointers are similar to references in the sense that... they point to things.

Maybe a better explanation would just be to show a practical example.

int main(...){

    int y = 5;
    int* x = &y; //most people do int *x but this is preference

    int* a, b, c; //defines a(pointer) and ints b, c
    int *d, *e, *f; //defines 3 pointers

    *x = 6;

    return 0;
}

int* x is declaring our pointer and we're assigning it to a reference of y - in other words, the address of y. When we want to change the value that x is pointing to we dereference it using *. Changing the value that x is pointing to will also change the value of y. So in this case, we dereference x, which is pointing to y, and change it to 6 - y is now the value 6.

If we change the value of x itself, we run into problems that we may not want. Remember that C let's you do a lot of things, and this is one of them - changing what the pointer is pointing to isn't necessarily wrong, you just have to know what you're changing it to. You could end up pointing to unwanted data and crash your program, or worse, create security vulnerabilities. Since x is holding an address, changing that will make x point to another address. This could be useful for arrays, which we'll get into later, but in general you shouldn't change the value of a pointer - you should only change the value that it's pointing to.

Pass by Reference


C is a pass by value language, meaning anything you pass through a function will keep its value even if you change it in a function. That is, unless you use pointers (in case you didn't know, Java is exactly the same with its references).

void foo(int a){
    a = 6;
}

void bar(int* a){
    *a = 6;
}

int main(...){

    int a = 9;
    foo(a);
    bar(&a);
    
    return 0;
}

a in foo() won't change the value of a in main(). bar(), however, will since bar() takes in the address of a and dereferences it in order to change the value.

You might remember why from CS211, but in short, a function makes a copy of a pointer or variable when it's passed through, so what you're really changing is that copy if you don't pass in a reference to it.

The same applies if you want to change the pointer itself (useful for arrays). You'll have to pass a pointer to the pointer.

void foo(int** a, int** b){
    *a = *b;
}

int main(...){

    int z = 1;
    int* b = &z;
    int** a = &b; 
    foo(a, &b);

    return 0;
}

This isn't the most practical example, but it should get the point across. Here we're changing what a (a double pointer) is pointing to, which is another pointer b. We pass the address of b to foo() because of that copy pointer we mentioned earlier. If we only pass b (which is again a pointer), and set *a = b in foo(), we'll end up pointing to the copy pointer, which won't exist after the function ends so we'll end up pointing to a garbage value later.

Useful Practices with Pass by Reference


This technique will be particularly useful in your projects when initializing values or changing values within functions all while avoiding overuse of pointers or when you want to obtain multiple return values from a function.

Side note: In general, I hate using pointers (and malloc) when I dont have to - particularly with structs since it's more annoying to use '->' instead of '.'. Or when I want to get multiple return values from a function I'd rather not dereference them every time I use them.

For example, I use this often for initializing:

void init(struct node* x){

    x->data = 0;
    x->next = NULL;
    //Oh did I mention that accessing members in a struct is different with pointers?
    //Instead of using '.' you use '->'
}

int main(...){

    struct node x; // note that I didn't use a pointer here when declaring
    init(&x);
    return 0;
}

Multiple return values from a function:

int foo(int* a, int* b, char* file_path1, char* file_path2){

    *a = read(...);
    *b = read(...);
    return 0;
}

int main(...){

    int a;
    int b;
    int status = foo(&a, &b, "some/path", "another/path");
    return 0;
}

Why is this one in particular useful? In this case, we can now get how many bytes read from each file and get a status value from the function (if we had conditionals checking for failure in the function, but in this case it always gives 0). Extremely useful when debugging a program and it also saves on extra function calls or sometimes on lines of code. Get familiar with using these examples - they'll be useful in any C program you write.

Malloc and Free


How to dynamically allocate memory for anything.

Let's get one thing clear - you malloc() something, you free() it. Don't be lazy just because a class doesn't check for memory leaks. If you don't want to get familiar with this practice, then don't even think about programming in C/C++.

#include<stdlib.h> //make sure to include this. malloc is from this package.
int main(...){

    int* a = (int*) malloc(4);
    free(a);
    return 0;
}

We'll get into complex uses of malloc() in arrays and some simplifications in the sizeof section. This section is going to show what exactly malloc is doing (in the scope of this review - we'll go over more things about malloc() in CS416 but there are even more things that we don't cover in the class. Shameless plug for CS519 - OS Theory).

In this example we have a pointer a getting a dynamically allocated space of 4 bytes. malloc() takes in a value for how many bytes you want to allocate - and it doesn't care what type it is. Remember that C is a language about memory and data management and every piece of data and memory is just a bunch of bits and bytes to your computer (and to C somewhat). It doesn't care what the data is or what it looks like, it just needs to know what you want to do with it (we'll expand on this in the casting section).

Since malloc() doesn't care about type, it returns a void* so we need (should) to cast it to the type that we want it to be in. I say should because it isn't all that necessary, but it's good practice to.

Now in this example, it isn't all that useful. We just got 4 bytes allocated to a pointer, which we could've done without using malloc(). When we get to arrays, we'll see a much more meaningful example.

As mentioned earlier, when you malloc(), you free() it. Yes, the OS will recollect it all at the end of the program, but that doesn't mean you don't need to free allocations. For example, when you reassign pointers that previously pointed to allocated space, you essentially lost that previously allocated space and now it's just taking up valuable space. Not great if you're working with large data sets that you constantly need to load in and out of memory. Not only that but you open your program up to security vulnerabilities. If you're too lazy to free() see if there's a way for you to write certain parts of your code without malloc().

An important difference between malloc() and static allocation is that dynamically allocated space lives on the heap while statically allocated data lives on the stack. Knowing this distinction is important for many things (one of them being threading). Refrain from doing large allocations(eg. arrays) on the stack, use malloc to do large allocations. Your local variables/pointers still lives on the stack though regardless of the allocation type.

A useful tool to check for memory leaks is valgrind. Just be careful, if you have too many stack frames (lots of recursion or forking or both), valgrind slows down significantly (I may or may not have made an iLab unusable for a couple of hours from forking recursively too many times [around 4000 processes] with valgrind in CS519).

Arrays and Strings


We kept arrays for after pointers and malloc() because there are two ways to use arrays (well the second one isn't really an "array" but we can use it as one since they're basically the same thing).

Arrays

#include<stdlib.h>
int main(int argc, char** argv){
    int a[5] = {1, 2, 3, 4, 5}; //allocated on stack
    int* b = (int*) malloc(20); //allocated on heap
    for(int i = 4; i >= 0; i--){
        b[i] = i; //initializing array values
        a[i%4] = i; //just doing something here to show array accesses are the same in both cases
    }
    return 0;
}

The first way to make an array is to statically allocate space and manually initialize it. This makes the size of the array immutable but if you know what values you want (ie. a list of delimiters) and you don't need to change the size of the array.

The second way is through malloc() and a pointer. A pointer just points to the starting address of an allocation so that's what we'll use it for here. Notice the number of bytes specified in malloc() - we requested 20 bytes since an integer is 4 bytes and we want 5 of them.

You most likely won't see the first usage too often particularly with integers - they aren't all that convenient either (honestly I forget how to initialize them every time because I don't use them). Maybe with strings you'll use the first one for constant strings.

Aside: Technically a[] is a pointer too but your compiler will probably complain if you pass a[] through a function with the parameter specification int* a instead of int a[5]. Yes, you need to specify the size(5) in the function header if you use the latter, which makes it even less convenient.

Accessing and modifying an array is the same as Java regardless of which array type you use (static or dynamic).

The Danger of C Arrays

C doesn't protect you against going over an array (ArrayOutOfBoundsException in Java). You can set the -fsanitize or -Werror flag to check for heap buffer overflows or stack smashing (read the article "Smashing the Stack" for why stack smashing and buffer overflows are crucial to know about). If you write past the end of a buffer, C will let you do so. You can even read past the buffer if you wanted to in some cases.

#include<stdio.h>
int main(...){

    int a[5] = {1, 2, 3, 4, 5}; //valid indexes 0-4
    a[5] = 34;
    printf("%d\n", a[5]);
    return 0; 
}

Here's an example of

  1. writing past your buffer and
  2. reading that written value.

Both clearly index out of bounds, but C will let you do this (unless somehow a[5] is beyond your process's stack space or heap space if you made a dynamic allocation - in both cases you'll segfault) - and GCC will let you compile unless it detects stack smashing.

It is suggested that you go and try to segfault a simple program kind of like this one to find out what not to do. The best way to review is by doing - not reading... and then forgetting.

Strings


C doesn't have a formal way to make or use strings. Instead they're always char arrays with a special ending to the buffer - the null terminator ('\0'). The null terminator let's the program know where the end of the char buffer is (we'll call it string from now on). Without it, if you try to print out a string, the program won't know where to end the buffer and will keep going until it finds a null terminator. Similar to what we mentioned before, you can read past the end of an array so C strings will keep doing that unless a null terminator is specified. What this means is that your printed string might contain garbage values since it doesn't know how to interpret those values.

#include<stdlib.h>
#include<stdio.h>
int main(...){
    char a[6] = "hello";
    char* a2 = "hello"; //essentially the same thing as above but getting the size of this will be more difficult (read sizeof section)
    char* b = malloc(6);
    snprintf(b, 6, "hello"); //don't worry about this. it's just filling in the array - the function also null terminates conveniently.
    return 0;
}

String a is a static, immutable string and it is null terminated for you. Since "hello" is a five letter word, we need 6 spaces for the null terminator at the end.

In the case you malloc() a string, you will need to null terminate it yourself if you manually fill it in. In this example, snprint() is used for convenience to fill in the buffer and null terminate it.

A word of warning - not all functions null terminate, particularly the ones from string.h (ironically) - so be wary of that.

Aside: in Microsoft's C library, _snprintf was historically the only available version of the function which did not null terminate. Go Linux.

If you need a list of the buffer functions in stdio.h or string functions in string.h, TutorialsPoint has them in their C documentation (though it's a little outdated since it doesn't have functions like snprintf()).

Sizeof


Another topic we left out until now since we wanted to group all of its examples in one spot instead of spread out in different sections.

sizeof is an extremely convenient operator that can be used for a multitude of things, but its most common usage is in malloc(). In our earlier examples of malloc() we were explicitly putting in the number of bytes we wanted to allocate by doing some simple math. But this is inconvenient and annoying if you need to remember what size a certain type is for every machine (again, this isn't all that relevant anymore but isn't something we should ignore either).

Here are some of the different usages of sizeof

#include<stdlib.h>
#include<stdio.h>
int main(...){

    printf("%d\n", sizeof(int)); //prints 4
    printf("%d\n", sizeof("hello")); //prints 5
    
    double* x = (double*) malloc(sizeof(double)*4); //allocated 8*4 bytes

    int a[5] = [1, 2, 3, 4, 5];
    printf("%d\n", sizeof(a)/sizeof(int)); //size of a static array
    return 0;
}

Note in the last example, we get the size of a static array. This way of getting the size doesn't work for dynamically allocated arrays since the sizeof(a), if a were a pointer declared as int* a, would be 8 bytes. In this example though, sizeof(a) gives 20, which is then divided by the size of an int.

Casting


Casting in C works differently from Java. In Java, if you cast data to a certain type, it will try to shape it into that type - hence why it sometimes (all the time) complains that you can't cast a certain type into another.

But remember that C doesn't care. Data is just a bunch of 1's and 0's - it doesn't matter what the shape is. You can turn anything into anything. Literally.

You have a struct that's an abstraction of a node? Cast it to a char* because why not. It'll probably come out with garbage values when you try to print it but you can do it.

A more practical usage you'll see is in your third project. This is critical since it's how you'll be manipulating your virtual addresses.

int main(...){

    unsigned long x = 0xFFFFFF;
    void* ptr = (void*) x; //you will literally be doing things like this in your project
}

Another example is when you're reading data from a buffer (for example reading a buffer from a socket).

int socket = some_file_descriptor
int number;
write(socket, (char*)&number, sizeof(int)); //address of "number" then casted to char*

The cast is a little unnecessary here since the int will automatically get casted to char* when it's passed through the function, but again good practice and also to stop the compiler from complaining with a bunch of warnings.

One more example just to show you that you can really do whatever you want, though not without consequences (so many compiler warnings)

#include<stdio.h>
int main(...){

    int a = (int) "abcd";
    printf("%s\n", (char*)&a); //this will print garbage values and not "abcd" but the point is that C will let you run this.
    return 0;
}

Bit Manipulation


Get used to bit manipulation in Operating Systems. They're everywhere. Bit shifting, masking, toggling, etc. You should have seen this in CS211 and for the malloc project in CS214 (assuming you got your "metadata" down to 2 bytes, which requires you to use bit manipulation).

If you're wondering why it's so important - bit shifting is fast... really fast (just a single instruction fast). Not only that, it allows you to save space keeping track of things. For example, to keep track of allocated blocks in disk (let's say it has 8 blocks), you can use a single int to keep track of the block status. The first bit for the first block, second for the second, etc. (don't worry we'll go over this exact thing later in CS416).

Here's a quick rundown

int main(...){

    int x = 1;
    x = x << 1; //left shift by one. x is now 2. You can shorten this by doing x <<= 1
    x >>= 1; //right shift by one. x is now 1. This one is already shortened syntax
    if(x & 1){} //bitwise AND
    if(x | 1){} //bitwise OR
    if(~x){} //bitwise NOT
    if(x^1){} //bitwise XOR
    return 0;
}

Bit Masking


This is a bit tricky at first but once you practice it enough you'll understand it. What you're doing here is extracting bits out of a value to get only certain bits that you want. Again, this is commonly used when tracking the status of things in your operating system so you'll need to get used to seeing these.

Note: I'm intentionally using unsigned integers here otherwise we won't get the desired result for some of these.

If we had the hex 0xFF (one byte of all 1's) and we wanted only the last 4 bits (going left to right) we would do

uint32_t last_four = 0xFF & 0xF;

0xF is our mask, and this is just 4 bits of 1's. If we bitwise AND, then it'll give us what the last four bits of 0xFF are. If we convert it to binary it looks like this

0xFF: 1111 1111
0xF:  0000 1111
---------------
      0000 1111

If we wanted the first four bits, we would have to right shift 0xFF by 4 bits and then apply the mask with bitwise AND again.

uint32_t first_four = (0xFF >> 4) & 0xF;

A more practical example: (here's a tip in this problem as well. If you're converting from decimal to binary, sometimes it's easier to convert to hex and then write out the binary representation of each hex digit, and vice versa)

uint32_t mask = 33; //00100001 in binary or 0x21 in hex
uint32_t value = 104; //01101000 in binary or 0x68 in hex

uint32_t AND_mask = value & mask; //00100000 or 32 in decimal or 0x20 in hex
uint32_t OR_mask = value | mask; //01101001 or 105 or 0x69 (nice)

Here's how you toggle a bit

uint32_t x = 001100101; //binary in C - the first 0 indicates that it's binary but it doesn't show up in markdown
uint32_t toggle = 1;

//let's toggle the 5th bit
x ^= (toggle << 4); //now its 01110101 
x ^= (toggle << 4); //back to 01100101

Not Essentials But Must Haves

The topics below are really not essential to C programming but are heavily used in and out of this class. If you want to be a good C programmer, these are must haves.

Typedef


A handy concept for shortening down type definitions. Notice how you need to type out struct node every time you declare a variable. This get's really annoying after a while especially if you malloc structs constantly or write a lot of functions using structs. But it really helps in code readability.

Aside: Linus Torvalds begs to differ.. Pepper ain't he? Tl;dr - don't hide things unnecessarily behind typedefs. Particularly pointers; never do that.

//you can typedef immediately in a struct definition
typedef struct node{
    int data;
    struct node next;
}node_t; //I'm pretty sure the _t is just convention for structs and integers - Linus is still not a fan

//alternatively
typedef struct node node_t;

//you can also typedef primitives
typedef unsigned int my_uint32;

Macros


My favorite thing about C. You can do so many things with macros (I made a generic typed minheap library using only macros - it was kind of annoying though for reasons outlined below).

The first thing about macros - they are precompiled meaning your compiler itself doesn't check their syntax (GCC has a precompiler). This is because macros are essentially just find and replace definitions. Any instance of a macro name will be replaced with the macro so you should be careful in naming your macros or using common names as variables. In general a good practice is to capitalize all of the letters in macros.

The second thing about macros - they're fast, but only if you use them correctly. Macros are a decent way to avoid function calls which have the disadvantage of creating stack frames. If you call a function a lot in a loop, maybe it's best to use a macro instead. If you have a function that's only one line or has small functionality (for example bit shifting/masking is really fast and doesn't require that much code) you might want to use a macro instead.

Macros aren't for everything though. If your macro is just calling a function, you lose the benefits of using a macro. You could also use inline functions instead of macros but I don't have much experience in using them.

Third thing about about macros - you can't debug them. Yep, if you use a debugger like GDB and run into a macro, GDB will just skip over the code in the macro. This is what made making the minheap super annoying - but it's fast so I guess there's that. There's a compiler flag you can set to prevent this but generally it isn't worth it (and your macros shouldn't be written in an error prone manner anyway).

General good practice with macros - K.I.S.S. (keep it stupid simple). Just a good thing to remember in operating systems sometimes.

//simple macros
#define N 15
#define KiB 1024 //kibibyte
#define MiB KiB*1024 //Mibibyte

//functional macros
#define SUM(A, B) A+B
#define TOGGLE(X, SHIFT) X ^= 0x1 << SHIFT

//multiline functional macros
#define DEBUG(X, ERR)   \
do{                     \
    if(ERR){            \
        printf(X, ERR); \
    }else{              \
        printf("PASS"); \
    }                   \
}while(0);

The multiline macro might need some explaining. First, you don't need to line up the \, it just looks neater. Second, the do-while is generally good to include in every macro. There's a stackoverflow answer on why but in short, it's so that the macro expands properly in code blocks (conditionals, loops, etc.) with or without brackets surrounding the block.

The semi-colon at the end (while(0);) isn't necessary and in the stackoverflow answer it says it defeats the purpose of the expansion, but I put it there anyway just in case I forget to put a ; after my macro calls (though in generall you shouldn't put semi-colons in macros, otherwise you can't use them as arguments for functions).

You should always surround your macro definitions with conditional checks. If you don't you will override any existing macros with the same name.

#ifndef
#define MACRO1 something
#define MACRO2 something2
...
#endif

In your makefile you can also define which macro definitions to push to your code. This is useful for debugging - sometimes you don't want debug messages to show up and other times you do. You'll see an example of this in some of the project makefiles where we push certain macros to enable/disable certain blocks of code.

Header Files


The all important header file. This can get its own document with how much you can do with it, but we'll simplify it here.

First, if you have any functions that you want to make globally available to any source file that includes your header file, put your function prototypes in that header file - you don't have to put the prototypes in the .c file. The same goes with macros, structs, typedefs, or any other packages you want to include.

It's best to keep common packages out of the header file. The precompiler just takes you header file and basically copy pastes it to the top of your source file (not exactly but close enough for what we need it for) so any packages included in your header, will be included in every file that includes your header file. Try to be wary of this so you don't repeat common definitions, includes, or end up with circular dependencies.

Second, always surround the header with conditional checks. Having multiple conditional checks might be better but you won't have to worry about that in this class. You'll see them in the example.

Header File (test.h)

#ifndef _TEST_H //this is just checking on a variable name _TEST_H. Is used to avoid getting included again.
//ifndef -> if not defined, ifdef -> if defined 

/*include any necessary packages for your .c file that other people might also want to use (but not explicitly include).
Generally speaking, you don't need to include the packages that are in this example in the header file - let the user
decide that - but you might want to include header files that might contain some definitions needed to call the functions
in this header file (macros or typedefs from other header files for example). This decision is a little bit of a
headache and a lot of the time you'll see common packages included in header files anyway.*/
#include<pthread.h>

#define MACRO1 ...
...

typedef struct X{
    int member1;
    char* member2;
}Y_t;

void function1(int, int);
int function2(char*, void*);
#endif

Your C library file (test.c)

#include<stdio.h>
#include<stdlib.h>
#include "path/to/test.h" //make sure to include the headerfile in your library file

void function1(int a, int b){ //do something 
}
int function2(char* a, void* b){ //return something 
}

The C file using your library file (a.c)

#include<stdio.h>
#include<stdlib.h>
#include "path/to/test.h"

int main(...){
    int x = MACRO1;
    Y_t* z = (Y_t*) malloc(sizeof(Y_t)); //poor naming choice, I know
    function1(1, 2);
    return 0;
}

Other Important Things

Things that are useful, but wont be explained too in-depth. These are still important but either they're seldomly used or we'll be going over them in lecture and recitations. I'll leave it to you guys to figure out how to use these (man pages, tutorialspoint, any other documentation available for free online, etc.).

Pointer Arithmetic


Pointer arithmetic isn't something you'll be using often, but again, you should know it since it has its uses.

Pointer arithmetic is simple. You take a pointer... and you add to it (or any operation really).

What use does this have? Didn't we establish earlier that we don't want to stray from the address a pointer is referring to? The answer is (mainly) arrays (and getting virtual addresses of page tables but you'll get to that in CS416 eventually so don't worry about it for now).

#include<stdio.h>
int main(...){

    int a[5] = {1, 2, 3, 4, 5};
    print("%d\n", a+3);
    return 0;
}

This will print 4 since a is pointing to the start of the array and we accessed an offset of 3. Something to remember is that when doing pointer arithmetic, the number of bytes you jump is based on type. So when you do a+3, you're really doing address_at_0 + (3*sizeof(int) bytes).

This might be particularly useful when manipulating strings.

Ternary statements


Extremely convenient for shortening simpler conditionals. The syntax is the same as Java.

read(), write(), dirent, and other File Based I/O


For the most part, you won't be needing these until the last project. We'll be going over them later on in the semester if you need a refresher, so don't worry too much about these. If you want to learn them now you can check the man pages or tutorialspoint's C tutorial on all of these things.

memcpy(), memset(), calloc()


Rather than initializing every member of a struct manually, if you just need everything to be NULL or zeroed out, you can just use memset(). Alternatively, you can use calloc() which is similar to using malloc() then memset().

Realloc


Reallocating space that you already requested (usually for resizing). Similar to

  1. malloc() new ->
  2. memcpy() old ->
  3. free() old.

Be careful when using this. While it's convenient, it isn't the solution to everything.

Also, regardless of whether you malloc() to reallocate space or realloc(), never reallocate 1 byte at a time unless you really want to segment your heap and slow down your program. As a general tip, if you don't know how much space you need, start off with a reasonable guess. If you need to reallocate and you don't know how much extra space you need, double your allocation (or make another reasonable guess and go slightly over that), then resize once you know close to the amount you need. Don't do this too often otherwise your program will spend more time reallocating than it needs to.

Const


Similar to final in Java, but does different things based on where it's placed in a variable declaration.

Static


This might be a little confusing at first (it doesn't do the same thing as static in Java), but in short, if it's used on a global, the global is

  1. initialized to zero and
  2. its value persists throughout the lifetime of the program

Particularly useful when the global is in a different file from main().

If it's used on a function, it makes the function visible only to the current source file (similar to private in Java, though not quite).

I'll leave it to you to figure out what it does if you use it locally in a function, but you won't be seeing this usage in this class.

Not Used (in CS416) But Interesting Topics

Since this document is mostly meant for C review (focusing on the class), there are some things that were left out that you can look into on TutorialsPoint. Not all of them are mentioned here, just some notable ones. For the most part you won't need to know these for this class.

Variable Arguments


Similar to *args and **kwargs in python, C can have a varying number of arguments in a function. This isn't used at all in the class, but again it is available on TutorialsPoint and the man pages.

Preprocessors


This isn't really that important for this class. Just know that it exists and it's what handles checking your macros and header files.

Bit Fields


This is completely new to me as well, but they seem pretty cool. That being said, bit manipulation seems to solve what this is doing in a better way so your mileage may vary.