The Essentials

The next few sections focus on the more important aspects of C that we'll be using a lot in this class (and in general if you're a C programmer).

Structs


NOT objects. They aren't all that similar either. Structs don't belong to a class and they also don't hold any functions, which is key in objects. Structs are, however, abstractions of a group of things.

Structs hold "members" which are the data types that are valid to store in the struct itself.

struct node{
    int data;
    struct node *next;
}; //don't forget this semi-colon

The members of the struct (data and next) are the only valid data that you can put into this struct node. In memory, the size of a struct is the sum of sizes of its members.

Struct initializing and Referencing struct members (without pointers)

int main(...){

    struct node head = {.data = 4, .next = NULL}; //initializing like this might depend on the standard you're using
    //This is a more explicit way to declare and initialize
    struct node x;
    x.data = 5;
    x.next = NULL;

    //come back to this after reading pointers,
    //dynamic allocation and sizeof
    //Ignore for now.
    struct node *ptr = (struct node*)malloc(sizeof(struct node));
    ptr->data = 5;
    ptr->next = &head;


    int val = head.data; //referencing a member
    int ptr_next_val = ptr->next.data; //==val

    return 0;
}

It's important to note that structs that are declared but not initialized might not always be zeroed out. They can contain garbage values so you'll want to initialize structs every time to ensure correctness - though this isn't completely necessary.

You can also make locally defined structs. Just do the same syntax inside a function. These would be unavailable for the world outside this function.

Enums and Unions


This is condensed into one section since you won't be using these all that much, but you should know them anyway because they still have their uses outside of the class.

Enums

Enums are just ways to give names to numbers so that your program is easier to read and maintain. In general, giving a name to a value is useful, especially if you use that value a lot for the same things (MACROS).

There's a few ways to declare these but here's the simplest one.

//classic example
enum months{
    JAN; //note this starts from 0, not 1 unless you specify each enumeration
    FEB;
    ... //etc.
}

Unions

Unions are a special type of structure. The difference between a union and a struct is the size. The size of a union is the size of its largest member. Unions are usually used for statuses or flags, but really unions are generally good for space saving with anything that requires a large number of variables but don't need to use any two at the same time.

union flags{
    int flag1;
    double flag2;
}

The size of the union in this case is 8 bytes (size of a double). If you populate flag1 with a value, you will only be using the lower 4 bytes of the union since an int is only 4 bytes. Be careful, if you populate flag2 afterwards, you will lose the value of flag1. Similarly, if you populate flag2 first, and then populate flag1, you will change only the lower 4 bytes, so if you access flag2 again, you'll get a different (or corrupt) value.

Pointers


A bit different from Java references. You can do a lot more with these. In other words, this is how you accidentally shoot yourself in the foot if you aren't careful. Pointers are similar to references in the sense that... they point to things.

Maybe a better explanation would just be to show a practical example.

int main(...){

    int y = 5;
    int* x = &y; //most people do int *x but this is preference

    int* a, b, c; //defines a(pointer) and ints b, c
    int *d, *e, *f; //defines 3 pointers

    *x = 6;

    return 0;
}

int* x is declaring our pointer and we're assigning it to a reference of y - in other words, the address of y. When we want to change the value that x is pointing to we dereference it using *. Changing the value that x is pointing to will also change the value of y. So in this case, we dereference x, which is pointing to y, and change it to 6 - y is now the value 6.

If we change the value of x itself, we run into problems that we may not want. Remember that C let's you do a lot of things, and this is one of them - changing what the pointer is pointing to isn't necessarily wrong, you just have to know what you're changing it to. You could end up pointing to unwanted data and crash your program, or worse, create security vulnerabilities. Since x is holding an address, changing that will make x point to another address. This could be useful for arrays, which we'll get into later, but in general you shouldn't change the value of a pointer - you should only change the value that it's pointing to.

Pass by Reference


C is a pass by value language, meaning anything you pass through a function will keep its value even if you change it in a function. That is, unless you use pointers (in case you didn't know, Java is exactly the same with its references).

void foo(int a){
    a = 6;
}

void bar(int* a){
    *a = 6;
}

int main(...){

    int a = 9;
    foo(a);
    bar(&a);
    
    return 0;
}

a in foo() won't change the value of a in main(). bar(), however, will since bar() takes in the address of a and dereferences it in order to change the value.

You might remember why from CS211, but in short, a function makes a copy of a pointer or variable when it's passed through, so what you're really changing is that copy if you don't pass in a reference to it.

The same applies if you want to change the pointer itself (useful for arrays). You'll have to pass a pointer to the pointer.

void foo(int** a, int** b){
    *a = *b;
}

int main(...){

    int z = 1;
    int* b = &z;
    int** a = &b; 
    foo(a, &b);

    return 0;
}

This isn't the most practical example, but it should get the point across. Here we're changing what a (a double pointer) is pointing to, which is another pointer b. We pass the address of b to foo() because of that copy pointer we mentioned earlier. If we only pass b (which is again a pointer), and set *a = b in foo(), we'll end up pointing to the copy pointer, which won't exist after the function ends so we'll end up pointing to a garbage value later.

Useful Practices with Pass by Reference


This technique will be particularly useful in your projects when initializing values or changing values within functions all while avoiding overuse of pointers or when you want to obtain multiple return values from a function.

Side note: In general, I hate using pointers (and malloc) when I dont have to - particularly with structs since it's more annoying to use '->' instead of '.'. Or when I want to get multiple return values from a function I'd rather not dereference them every time I use them.

For example, I use this often for initializing:

void init(struct node* x){

    x->data = 0;
    x->next = NULL;
    //Oh did I mention that accessing members in a struct is different with pointers?
    //Instead of using '.' you use '->'
}

int main(...){

    struct node x; // note that I didn't use a pointer here when declaring
    init(&x);
    return 0;
}

Multiple return values from a function:

int foo(int* a, int* b, char* file_path1, char* file_path2){

    *a = read(...);
    *b = read(...);
    return 0;
}

int main(...){

    int a;
    int b;
    int status = foo(&a, &b, "some/path", "another/path");
    return 0;
}

Why is this one in particular useful? In this case, we can now get how many bytes read from each file and get a status value from the function (if we had conditionals checking for failure in the function, but in this case it always gives 0). Extremely useful when debugging a program and it also saves on extra function calls or sometimes on lines of code. Get familiar with using these examples - they'll be useful in any C program you write.

Malloc and Free


How to dynamically allocate memory for anything.

Let's get one thing clear - you malloc() something, you free() it. Don't be lazy just because a class doesn't check for memory leaks. If you don't want to get familiar with this practice, then don't even think about programming in C/C++.

#include<stdlib.h> //make sure to include this. malloc is from this package.
int main(...){

    int* a = (int*) malloc(4);
    free(a);
    return 0;
}

We'll get into complex uses of malloc() in arrays and some simplifications in the sizeof section. This section is going to show what exactly malloc is doing (in the scope of this review - we'll go over more things about malloc() in CS416 but there are even more things that we don't cover in the class. Shameless plug for CS519 - OS Theory).

In this example we have a pointer a getting a dynamically allocated space of 4 bytes. malloc() takes in a value for how many bytes you want to allocate - and it doesn't care what type it is. Remember that C is a language about memory and data management and every piece of data and memory is just a bunch of bits and bytes to your computer (and to C somewhat). It doesn't care what the data is or what it looks like, it just needs to know what you want to do with it (we'll expand on this in the casting section).

Since malloc() doesn't care about type, it returns a void* so we need (should) to cast it to the type that we want it to be in. I say should because it isn't all that necessary, but it's good practice to.

Now in this example, it isn't all that useful. We just got 4 bytes allocated to a pointer, which we could've done without using malloc(). When we get to arrays, we'll see a much more meaningful example.

As mentioned earlier, when you malloc(), you free() it. Yes, the OS will recollect it all at the end of the program, but that doesn't mean you don't need to free allocations. For example, when you reassign pointers that previously pointed to allocated space, you essentially lost that previously allocated space and now it's just taking up valuable space. Not great if you're working with large data sets that you constantly need to load in and out of memory. Not only that but you open your program up to security vulnerabilities. If you're too lazy to free() see if there's a way for you to write certain parts of your code without malloc().

An important difference between malloc() and static allocation is that dynamically allocated space lives on the heap while statically allocated data lives on the stack. Knowing this distinction is important for many things (one of them being threading). Refrain from doing large allocations(eg. arrays) on the stack, use malloc to do large allocations. Your local variables/pointers still lives on the stack though regardless of the allocation type.

A useful tool to check for memory leaks is valgrind. Just be careful, if you have too many stack frames (lots of recursion or forking or both), valgrind slows down significantly (I may or may not have made an iLab unusable for a couple of hours from forking recursively too many times [around 4000 processes] with valgrind in CS519).

Arrays and Strings


We kept arrays for after pointers and malloc() because there are two ways to use arrays (well the second one isn't really an "array" but we can use it as one since they're basically the same thing).

Arrays

#include<stdlib.h>
int main(int argc, char** argv){
    int a[5] = {1, 2, 3, 4, 5}; //allocated on stack
    int* b = (int*) malloc(20); //allocated on heap
    for(int i = 4; i >= 0; i--){
        b[i] = i; //initializing array values
        a[i%4] = i; //just doing something here to show array accesses are the same in both cases
    }
    return 0;
}

The first way to make an array is to statically allocate space and manually initialize it. This makes the size of the array immutable but if you know what values you want (ie. a list of delimiters) and you don't need to change the size of the array.

The second way is through malloc() and a pointer. A pointer just points to the starting address of an allocation so that's what we'll use it for here. Notice the number of bytes specified in malloc() - we requested 20 bytes since an integer is 4 bytes and we want 5 of them.

You most likely won't see the first usage too often particularly with integers - they aren't all that convenient either (honestly I forget how to initialize them every time because I don't use them). Maybe with strings you'll use the first one for constant strings.

Aside: Technically a[] is a pointer too but your compiler will probably complain if you pass a[] through a function with the parameter specification int* a instead of int a[5]. Yes, you need to specify the size(5) in the function header if you use the latter, which makes it even less convenient.

Accessing and modifying an array is the same as Java regardless of which array type you use (static or dynamic).

The Danger of C Arrays

C doesn't protect you against going over an array (ArrayOutOfBoundsException in Java). You can set the -fsanitize or -Werror flag to check for heap buffer overflows or stack smashing (read the article "Smashing the Stack" for why stack smashing and buffer overflows are crucial to know about). If you write past the end of a buffer, C will let you do so. You can even read past the buffer if you wanted to in some cases.

#include<stdio.h>
int main(...){

    int a[5] = {1, 2, 3, 4, 5}; //valid indexes 0-4
    a[5] = 34;
    printf("%d\n", a[5]);
    return 0; 
}

Here's an example of

  1. writing past your buffer and
  2. reading that written value.

Both clearly index out of bounds, but C will let you do this (unless somehow a[5] is beyond your process's stack space or heap space if you made a dynamic allocation - in both cases you'll segfault) - and GCC will let you compile unless it detects stack smashing.

It is suggested that you go and try to segfault a simple program kind of like this one to find out what not to do. The best way to review is by doing - not reading... and then forgetting.

Strings


C doesn't have a formal way to make or use strings. Instead they're always char arrays with a special ending to the buffer - the null terminator ('\0'). The null terminator let's the program know where the end of the char buffer is (we'll call it string from now on). Without it, if you try to print out a string, the program won't know where to end the buffer and will keep going until it finds a null terminator. Similar to what we mentioned before, you can read past the end of an array so C strings will keep doing that unless a null terminator is specified. What this means is that your printed string might contain garbage values since it doesn't know how to interpret those values.

#include<stdlib.h>
#include<stdio.h>
int main(...){
    char a[6] = "hello";
    char* a2 = "hello"; //essentially the same thing as above but getting the size of this will be more difficult (read sizeof section)
    char* b = malloc(6);
    snprintf(b, 6, "hello"); //don't worry about this. it's just filling in the array - the function also null terminates conveniently.
    return 0;
}

String a is a static, immutable string and it is null terminated for you. Since "hello" is a five letter word, we need 6 spaces for the null terminator at the end.

In the case you malloc() a string, you will need to null terminate it yourself if you manually fill it in. In this example, snprint() is used for convenience to fill in the buffer and null terminate it.

A word of warning - not all functions null terminate, particularly the ones from string.h (ironically) - so be wary of that.

Aside: in Microsoft's C library, _snprintf was historically the only available version of the function which did not null terminate. Go Linux.

If you need a list of the buffer functions in stdio.h or string functions in string.h, TutorialsPoint has them in their C documentation (though it's a little outdated since it doesn't have functions like snprintf()).

Sizeof


Another topic we left out until now since we wanted to group all of its examples in one spot instead of spread out in different sections.

sizeof is an extremely convenient operator that can be used for a multitude of things, but its most common usage is in malloc(). In our earlier examples of malloc() we were explicitly putting in the number of bytes we wanted to allocate by doing some simple math. But this is inconvenient and annoying if you need to remember what size a certain type is for every machine (again, this isn't all that relevant anymore but isn't something we should ignore either).

Here are some of the different usages of sizeof

#include<stdlib.h>
#include<stdio.h>
int main(...){

    printf("%d\n", sizeof(int)); //prints 4
    printf("%d\n", sizeof("hello")); //prints 5
    
    double* x = (double*) malloc(sizeof(double)*4); //allocated 8*4 bytes

    int a[5] = [1, 2, 3, 4, 5];
    printf("%d\n", sizeof(a)/sizeof(int)); //size of a static array
    return 0;
}

Note in the last example, we get the size of a static array. This way of getting the size doesn't work for dynamically allocated arrays since the sizeof(a), if a were a pointer declared as int* a, would be 8 bytes. In this example though, sizeof(a) gives 20, which is then divided by the size of an int.

Casting


Casting in C works differently from Java. In Java, if you cast data to a certain type, it will try to shape it into that type - hence why it sometimes (all the time) complains that you can't cast a certain type into another.

But remember that C doesn't care. Data is just a bunch of 1's and 0's - it doesn't matter what the shape is. You can turn anything into anything. Literally.

You have a struct that's an abstraction of a node? Cast it to a char* because why not. It'll probably come out with garbage values when you try to print it but you can do it.

A more practical usage you'll see is in your third project. This is critical since it's how you'll be manipulating your virtual addresses.

int main(...){

    unsigned long x = 0xFFFFFF;
    void* ptr = (void*) x; //you will literally be doing things like this in your project
}

Another example is when you're reading data from a buffer (for example reading a buffer from a socket).

int socket = some_file_descriptor
int number;
write(socket, (char*)&number, sizeof(int)); //address of "number" then casted to char*

The cast is a little unnecessary here since the int will automatically get casted to char* when it's passed through the function, but again good practice and also to stop the compiler from complaining with a bunch of warnings.

One more example just to show you that you can really do whatever you want, though not without consequences (so many compiler warnings)

#include<stdio.h>
int main(...){

    int a = (int) "abcd";
    printf("%s\n", (char*)&a); //this will print garbage values and not "abcd" but the point is that C will let you run this.
    return 0;
}

Bit Manipulation


Get used to bit manipulation in Operating Systems. They're everywhere. Bit shifting, masking, toggling, etc. You should have seen this in CS211 and for the malloc project in CS214 (assuming you got your "metadata" down to 2 bytes, which requires you to use bit manipulation).

If you're wondering why it's so important - bit shifting is fast... really fast (just a single instruction fast). Not only that, it allows you to save space keeping track of things. For example, to keep track of allocated blocks in disk (let's say it has 8 blocks), you can use a single int to keep track of the block status. The first bit for the first block, second for the second, etc. (don't worry we'll go over this exact thing later in CS416).

Here's a quick rundown

int main(...){

    int x = 1;
    x = x << 1; //left shift by one. x is now 2. You can shorten this by doing x <<= 1
    x >>= 1; //right shift by one. x is now 1. This one is already shortened syntax
    if(x & 1){} //bitwise AND
    if(x | 1){} //bitwise OR
    if(~x){} //bitwise NOT
    if(x^1){} //bitwise XOR
    return 0;
}

Bit Masking


This is a bit tricky at first but once you practice it enough you'll understand it. What you're doing here is extracting bits out of a value to get only certain bits that you want. Again, this is commonly used when tracking the status of things in your operating system so you'll need to get used to seeing these.

Note: I'm intentionally using unsigned integers here otherwise we won't get the desired result for some of these.

If we had the hex 0xFF (one byte of all 1's) and we wanted only the last 4 bits (going left to right) we would do

uint32_t last_four = 0xFF & 0xF;

0xF is our mask, and this is just 4 bits of 1's. If we bitwise AND, then it'll give us what the last four bits of 0xFF are. If we convert it to binary it looks like this

0xFF: 1111 1111
0xF:  0000 1111
---------------
      0000 1111

If we wanted the first four bits, we would have to right shift 0xFF by 4 bits and then apply the mask with bitwise AND again.

uint32_t first_four = (0xFF >> 4) & 0xF;

A more practical example: (here's a tip in this problem as well. If you're converting from decimal to binary, sometimes it's easier to convert to hex and then write out the binary representation of each hex digit, and vice versa)

uint32_t mask = 33; //00100001 in binary or 0x21 in hex
uint32_t value = 104; //01101000 in binary or 0x68 in hex

uint32_t AND_mask = value & mask; //00100000 or 32 in decimal or 0x20 in hex
uint32_t OR_mask = value | mask; //01101001 or 105 or 0x69 (nice)

Here's how you toggle a bit

uint32_t x = 001100101; //binary in C - the first 0 indicates that it's binary but it doesn't show up in markdown
uint32_t toggle = 1;

//let's toggle the 5th bit
x ^= (toggle << 4); //now its 01110101 
x ^= (toggle << 4); //back to 01100101