Lab 2: C Memory Management, Valgrind

Deadline: Friday, September 9, 11:59:59 PM PT


Setup

In your labs directory, pull the files for this lab with:

git pull starter main

If you get an error like the following:

fatal: 'starter' does not appear to be a git repository
fatal: Could not read from remote repository.

make sure to set the starter remote as follows:

git remote add starter https://github.com/61c-teach/fa22-lab-starter.git

and run the original command again.


Exercise 1: Bit Operations

For this exercise, you will complete bit_ops.c by implementing the bit manipulation functions get_bit, set_bit, and flip_bit (shown below). You may ONLY use bitwise operations such as and (&), or (|), xor (^), not (~), left shifts (<<), and right shifts (>>). You may not use any for/while loops or conditional statements. You also may not use modulo (%), division, addition, subtraction, or multiplication for this question.

/* Returns the Nth bit of X. Assumes 0 <= N <= 31. */
unsigned get_bit(unsigned x, unsigned n) {
    /* YOUR CODE HERE */
    return -1; /* UPDATE WITH THE CORRECT RETURN VALUE*/
}

/* Set the nth bit of the value of x to v. Assumes 0 <= N <= 31, and V is 0 or 1 */
void set_bit(unsigned *x, unsigned n, unsigned v) {
    /* YOUR CODE HERE */
}

/* Flips the Nth bit in X. Assumes 0 <= N <= 31.*/
void flip_bit(unsigned *x, unsigned n) {
    /* YOUR CODE HERE */
}

Once you complete these functions, you can compile and run your code using the following commands:

make bit_ops
./bit_ops

This will print out the results of the tests.


Exercise 2: Using GDB to find segfaults

Recall that a segmenation fault (or segfault) occurs when our program tries to access an area of memory that it does not have permissions to access. This error can be a result of trying to access memory that does not belong to the program or by trying to dereference a NULL pointer. Let's take a look at how we can use GDB to find the location of a segfault. We've provided you with a buggy implementation of ll_has_cycle from the last lab.

  1. Compile the code by executing

    make ll_cycle
    
  2. Execute the program. We can see that it segfaults, but we are not given any information about where the segfault occurred.

  3. We can use GDB to find out more information. Run GDB on the executable using the following command:

    cgdb ./ll_cycle
    
  4. Run the program until it segfaults using the run command.

  5. We can see that our program segfaulted on line 9 of ll_cycle.c. This corresponds to the line hare = hare->next->next;.

  6. We can examine the current state of the program to understand why the it segfaulted. Since we are trying to access hare on this line, let's print out the struct

    p hare
    

    Remember, this is just going to print out the pointer, so this information is not useful to us. If we want to examine the contents of the struct, we need to dereference it.

    p *hare
    

    We can see that the next pointer is NULL. This shows us that when the program executes hare = hare->next->next;, it is trying to access the next field of this NULL pointer which is what is causing the segmentation fault.

  7. Fix the error in the program and rerun it to make sure that it's not failing anymore.


Exercise 3: Valgrind

Even with a debugger, we might not be able to catch all bugs. Some bugs are what we refer to as "bohrbugs", meaning they manifest reliably under a well-defined, but possibly unknown, set of conditions. Other bugs are what we call "heisenbugs", and instead of being determinant, they're known to disappear or alter their behavior when one attempts to study them. We can detect the first kind with debuggers, but the second kind may slip under our radar because they're (at least in C) often due to mis-managed memory. Remember that unlike other programming languages, C requires you (the programmer) to manually manage your memory.

We can use a tool called Valgrind to help catch to help catch "heisenbugs" and "bohrbugs". Valgrind is a program which emulates your CPU and tracks your memory accesses. This slows down the process you're running (which is why we don't, for example, always run all executables inside Valgrind) but also can expose bugs that may only display visible incorrect behavior under a unique set of circumstances.

Using Valgrind to find segfaults

Let's see how we can use valgrind to find the same bug that we just found with gdb. Add back the bug to ll_has_cycle so that we can see how to detect it with valgrind.

  1. Run valgrind on the executable using the following command:
    valgrind ./ll_cycle
    

Let's break down the output. Below we can see the same print statements that are generated when we run our code normally.

valgrind ./ll_cycle
==4662== Memcheck, a memory error detector
==4662== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==4662== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==4662== Command: ./ll_cycle
==4662== 
Running tests...

Checking first list for cycles. There should be none, ll_has_cycle says it has no cycle
Checking second list for cycles. There should be a cycle, ll_has_cycle says it has a cycle
Checking third list for cycles. There should be a cycle, ll_has_cycle says it has a cycle
Checking fourth list for cycles. There should be a cycle, ll_has_cycle says it has a cycle

In the next chunk, we can see that we are performing an invalid read of size 8. This means that we are trying to read 8 bytes from a memory location that we do not have access to. Valgrind provides us with the call stack (the list of function calls) that led to the segfault. We can see that line 50 in main (located in test_ll_cycle.c) makes a call to ll_has_cycle (located in ll_cycle.c). We can then see that line 9 in ll_cycle.c is where the segfault occurs.

==4662== Invalid read of size 8
==4662==    at 0x108C4B: ll_has_cycle (ll_cycle.c:9)
==4662==    by 0x108B36: main (test_ll_cycle.c:50)
==4662==  Address 0x8 is not stack'd, malloc'd or (recently) free'd

The next chunk tells us that we encountered a segfault and repeats the information above.

==4662== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==4662==  Access not within mapped region at address 0x8
==4662==    at 0x108C4B: ll_has_cycle (ll_cycle.c:9)
==4662==    by 0x108B36: main (test_ll_cycle.c:50)

We can ignore this next part because we don't believe that we encountered a stack overflow (we have not used enough memory for that).

==4662==  If you believe this happened as a result of a stack
==4662==  overflow in your program's main thread (unlikely but
==4662==  possible), you can try to increase the size of the
==4662==  main thread stack using the --main-stacksize= flag.
==4662==  The main thread stack size used in this run was 8388608.

The next chunk tells us if there was any heap memory allocated when the program terminated. "All heap blocks were freed -- no leaks are possible" tells us that we have no leaks.

==4662== HEAP SUMMARY:
==4662==     in use at exit: 0 bytes in 0 blocks
==4662==   total heap usage: 1 allocs, 1 frees, 1,024 bytes allocated
==4662== 
==4662== All heap blocks were freed -- no leaks are possible

The next line tells us how to display surpressed errors (we don't care about that). And finally, we can see in the summary that we have 1 error.

==4662== For counts of detected and suppressed errors, rerun with: -v
==4662== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

Update the code so that this error no longer occurs. Once you update it, your valgrind output should look like this. We can see in the error summary at the bottom that there are no errors.

valgrind ./ll_cycle
==12790== Memcheck, a memory error detector
==12790== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==12790== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==12790== Command: ./ll_cycle
==12790== 
Running tests...

Checking first list for cycles. There should be none, ll_has_cycle says it has no cycle
Checking second list for cycles. There should be a cycle, ll_has_cycle says it has a cycle
Checking third list for cycles. There should be a cycle, ll_has_cycle says it has a cycle
Checking fourth list for cycles. There should be a cycle, ll_has_cycle says it has a cycle
Checking fifth list for cycles. There should be none, ll_has_cycle says it has no cycle
Checking length-zero list for cycles. There should be none, ll_has_cycle says it has no cycle

Congrats, you passed all the test cases!
==12790== 
==12790== HEAP SUMMARY:
==12790==     in use at exit: 0 bytes in 0 blocks
==12790==   total heap usage: 1 allocs, 1 frees, 1,024 bytes allocated
==12790== 
==12790== All heap blocks were freed -- no leaks are possible
==12790== 
==12790== For counts of detected and suppressed errors, rerun with: -v
==12790== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Invalid reads and memory leaks

Let's take a look at the Bork translation program! Bork is an ancient language that is very similar to English. To translate a word to Bork, you take the English word and add an 'f' after every vowel in the word.

Let's see if we can understand some Bork. Compile and run bork using the following commands.

make bork
./bork hello

An example output is provided below. Note that your output will probably look different.

Input string: "hello"
Length of translated string: 21
Translate to Bork: "hefl2?^?Ul2?^?Uof?^?U"

Hmm, Bork is an old language, but there shouldn't be all of these strange characters. It seems that perhaps the ancients left some bugs in their program! Shall we embark on a journey to squash bugs and uncover the true beauty of Bork?

If we take a brief glance at main, we can see that we are taking an input string (src_str) and translating it to Bork (dest_str). If we scroll to the top, we can see that we have a function (alloc_str) to allocate space for a string in the heap, a Str struct which contains a string and it's length, a make_str function which will create a Str struct and initialize its data and len field, and a function to free our struct's data. There is also a function to concate two strings together and another function to translate a letter to Bork. Now this is quite a long program to debug.

Wouldn't it be nice if there were a tool that gave us a good first place to look?

Well as it turns out, there are a couple and valgrind is one of them!

Let's run valgrind on our program using the following command.

valgrind ./bork hello
==10170== Memcheck, a memory error detector
==10170== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==10170== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==10170== Command: ./bork hello
==10170== 
==10170== Invalid read of size 1
==10170==    at 0x4C34D04: strlen (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==10170==    by 0x10879F: make_Str (bork.c:22)
==10170==    by 0x108978: translate_to_bork (bork.c:56)
==10170==    by 0x1089F2: main (bork.c:68)
==10170==  Address 0x522f041 is 0 bytes after a block of size 1 alloc'd
==10170==    at 0x4C31B0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==10170==    by 0x108781: alloc_str (bork.c:10)
==10170==    by 0x10895E: translate_to_bork (bork.c:54)
==10170==    by 0x1089F2: main (bork.c:68)
==10170== 
==10170== Invalid read of size 1
==10170==    at 0x4C34D04: strlen (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==10170==    by 0x10879F: make_Str (bork.c:22)
==10170==    by 0x108952: translate_to_bork (bork.c:51)
==10170==    by 0x1089F2: main (bork.c:68)
==10170==  Address 0x522f0e2 is 0 bytes after a block of size 2 alloc'd
==10170==    at 0x4C31B0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==10170==    by 0x108781: alloc_str (bork.c:10)
==10170==    by 0x10892D: translate_to_bork (bork.c:48)
==10170==    by 0x1089F2: main (bork.c:68)
==10170== 
Input string: "hello"
Length of translated string: 7
==10170== Invalid read of size 1
==10170==    at 0x4C34D04: strlen (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==10170==    by 0x4E9B4A2: vfprintf (vfprintf.c:1643)
==10170==    by 0x4EA2EE5: printf (printf.c:33)
==10170==    by 0x108A6F: main (bork.c:74)
==10170==  Address 0x522f317 is 0 bytes after a block of size 7 alloc'd
==10170==    at 0x4C31B0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==10170==    by 0x108781: alloc_str (bork.c:10)
==10170==    by 0x108833: concat (bork.c:32)
==10170==    by 0x108A15: main (bork.c:69)
==10170== 
Translate to Bork: "hefllof"
==10170== 
==10170== HEAP SUMMARY:
==10170==     in use at exit: 7 bytes in 1 blocks
==10170==   total heap usage: 11 allocs, 10 frees, 1,051 bytes allocated
==10170== 
==10170== LEAK SUMMARY:
==10170==    definitely lost: 7 bytes in 1 blocks
==10170==    indirectly lost: 0 bytes in 0 blocks
==10170==      possibly lost: 0 bytes in 0 blocks
==10170==    still reachable: 0 bytes in 0 blocks
==10170==         suppressed: 0 bytes in 0 blocks
==10170== Rerun with --leak-check=full to see details of leaked memory
==10170== 
==10170== For counts of detected and suppressed errors, rerun with: -v
==10170== ERROR SUMMARY: 6 errors from 3 contexts (suppressed: 0 from 0)

(Interesting side note: when we look at the normal program output in this valgrind log, we see normal behavior (i.e. it prints "hefllof"). That's because the way valgrind runs our program is different than how our program runs "naturally" (aka "bare metal"). We're not going to get into that for now.)

But back on debugging: A good general rule of thumb to follow when parsing big error logs is to only consider the first error message (and ignore the rest), so let's do that:

==10170== Invalid read of size 1
==10170==    at 0x4C34D04: strlen (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==10170==    by 0x10879F: make_Str (bork.c:22)
==10170==    by 0x108978: translate_to_bork (bork.c:56)
==10170==    by 0x1089F2: main (bork.c:68)

The error message states that we are doing an invalid read of size 1. What does this mean? An invalid read means that your program is reading memory at a place that it shouldn't be (this can cause a segfault, but not always). Size 1 means that we were attempting to read 1 byte.

Because we're unfamiliar with this ancient codebase and we don't want to read all of it to find the bug, a good process to follow is to start at high-level details and work our way down (so basically work our way through the call stack that valgrind provides).

Let's look at bork.c line 68 in main (the botton of the stack):

Str bork_substr = translate_to_bork(src_str.data[i]);

Is something funky going on here? Looks like we are just passing a character to translate_to_bork. Seems ok so far.

Let's go farther down the call stack and look at bork.c line 56 in translate_to_bork:

return make_Str(res);

We're just calling make_Str here. We should go deeper. Let's look at bork.c line 22.

return (Str){.data=str,.len=strlen(str)};

Here we are making a new Str struct and setting its data and len parameters. That seems normal too!

But valgrind says that strlen is doing an invalid read?

Well, we're passing a string to it right? What does strlen do again? It determines the length of a string by iterating over each character until it gets to a null terminator. Maybe there is no null terminator so strlen keeps going past the end of the string (which would mean that it's going past the area that we allocated for the string).

Let's make sure our string has a null terminator by checking where we created it.

Earlier, we saw this on line 56 in translate_to_bork.

return make_Str(res);

If we look two lines up (line 54), we can see that we are allocating space for the string by calling alloc_str. Let's take a look at this function.

char *alloc_str(int len) {
    return malloc(len*sizeof(char));
}

Hmmm. It looks like alloc_str is giving us some memory that's only len big, which means when we write to the string in translate_to_bork, we don't have enough space for a null terminator!

Let's make the following change to fix the problem:

10c10,12
<     return malloc(len*sizeof(char));
---
>     char *data = malloc((len+1)*sizeof(char));
>     data[len] = '\0';
>     return data;

Let's run our program to see if we fixed the problem

./bork hello
Input string: "hello"
Length of translated string: 7
Translate to Bork: "hefllof"

Everything looks like it's working properly. However, there could be hidden errors that we cannot see, so let's run our code through valgrind to make sure that there are no underlying issues.

valgrind ./bork hello
==29797== Memcheck, a memory error detector
==29797== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==29797== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==29797== Command: ./bork hello
==29797== 
Input string: "hello"
Length of translated string: 7
Translate to Bork: "hefllof"
==29797== 
==29797== HEAP SUMMARY:
==29797==     in use at exit: 8 bytes in 1 blocks
==29797==   total heap usage: 11 allocs, 10 frees, 1,061 bytes allocated
==29797== 
==29797== LEAK SUMMARY:
==29797==    definitely lost: 8 bytes in 1 blocks
==29797==    indirectly lost: 0 bytes in 0 blocks
==29797==      possibly lost: 0 bytes in 0 blocks
==29797==    still reachable: 0 bytes in 0 blocks
==29797==         suppressed: 0 bytes in 0 blocks
==29797== Rerun with --leak-check=full to see details of leaked memory
==29797== 
==29797== For counts of detected and suppressed errors, rerun with: -v
==29797== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Let's take a look at the heap summary below. It tells us that we had 8 bytes in 1 block allocated at the time of exit. This means that the memory in the heap that was not free'd stems from one allocation call and that it is 8 bytes large.

Next, we can see the heap summary which shows that we made 11 allocation calls and 10 frees over the lifetime of the program.

==29797== HEAP SUMMARY:
==29797==     in use at exit: 8 bytes in 1 blocks
==29797==   total heap usage: 11 allocs, 10 frees, 1,061 bytes allocated

Now let's take a look at the leak summary below. This just states that we lost 8 bytes in 1 block.

==29797== LEAK SUMMARY:
==29797==    definitely lost: 8 bytes in 1 blocks
==29797==    indirectly lost: 0 bytes in 0 blocks
==29797==      possibly lost: 0 bytes in 0 blocks
==29797==    still reachable: 0 bytes in 0 blocks
==29797==         suppressed: 0 bytes in 0 blocks
==29797== Rerun with --leak-check=full to see details of leaked memory

It tells us to "Rerun with --leak-check=full to see details of leaked memory", so let's do that.

valgrind --leak-check=full ./bork hello
==32334== Memcheck, a memory error detector
==32334== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==32334== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==32334== Command: ./bork hello
==32334== 
Input string: "hello"
Length of translated string: 7
Translate to Bork: "hefllof"
==32334== 
==32334== HEAP SUMMARY:
==32334==     in use at exit: 8 bytes in 1 blocks
==32334==   total heap usage: 11 allocs, 10 frees, 1,061 bytes allocated
==32334== 
==32334== 8 bytes in 1 blocks are definitely lost in loss record 1 of 1
==32334==    at 0x4C31B0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==32334==    by 0x108784: alloc_str (in /home/cc/cs61c/fa22/staff/cs61c-tac/bork)
==32334==    by 0x10884E: concat (in /home/cc/cs61c/fa22/staff/cs61c-tac/bork)
==32334==    by 0x108A30: main (in /home/cc/cs61c/fa22/staff/cs61c-tac/bork)
==32334== 
==32334== LEAK SUMMARY:
==32334==    definitely lost: 8 bytes in 1 blocks
==32334==    indirectly lost: 0 bytes in 0 blocks
==32334==      possibly lost: 0 bytes in 0 blocks
==32334==    still reachable: 0 bytes in 0 blocks
==32334==         suppressed: 0 bytes in 0 blocks
==32334== 
==32334== For counts of detected and suppressed errors, rerun with: -v
==32334== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

Now Valgrind is telling us the location where the unfree'd block was initially allocated. Let's take a look at this below. If we follow the call stack, we can see that malloc was called by alloc_str which was called by concat in main.

==32334== 8 bytes in 1 blocks are definitely lost in loss record 1 of 1
==32334==    at 0x4C31B0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==32334==    by 0x108784: alloc_str (in /home/cc/cs61c/fa22/staff/cs61c-tac/bork)
==32334==    by 0x10884E: concat (in /home/cc/cs61c/fa22/staff/cs61c-tac/bork)
==32334==    by 0x108A30: main (in /home/cc/cs61c/fa22/staff/cs61c-tac/bork)

If we look in main, we can see that we allocate the space for dest_str by calling concat, but we never free it. We need dest_str until the end of the program, so let's free it right before we return from main. This struct was allocated on the stack in main (Str dest_str={};), so we do not need to free the struct itself. However, the data that the struct points to was allocated in the heap. Therefore, we only need to free this portion of the struct. If you take a look near the top of the program, we have already provided a function free_Str to free the allocated portion of the struct. Let's call this function at the end of our program.

76a77
>     free_Str(dest_str);

You might be wondering why we are not freeing src_str. If we take a look at where we constructed src_str (Str src_str = make_Str(argv[1]);), we can see that it was created using make_str which does not make any calls to allocate space on the heap. The string that we are using to make src_str comes from argv[1]. The program that calls main is responsible for setting up argv[1], so we don't have to worry about it.

Once we fix our error, the valgrind output should look like this. The heap summary shows that there are no blocks allocated at the time we exit. The error summary at the bottom shows us that there are no errors to report.

valgrind ./bork hello
==10835== Memcheck, a memory error detector
==10835== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==10835== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==10835== Command: ./bork hello
==10835== 
Input string: "hello"
Length of translated string: 7
Translate to Bork: "hefllof"
==10835== 
==10835== HEAP SUMMARY:
==10835==     in use at exit: 0 bytes in 0 blocks
==10835==   total heap usage: 11 allocs, 11 frees, 1,061 bytes allocated
==10835== 
==10835== All heap blocks were freed -- no leaks are possible
==10835== 
==10835== For counts of detected and suppressed errors, rerun with: -v
==10835== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Exercise 4: Memory Management

This exercise uses vector.h, test_vector.c, and vector.c, where we provide you with a framework for implementing a variable-length array. This exercise is designed to help familiarize you with C structs and memory management in C.

  1. Try to explain why bad_vector_new() is bad. We have provided the reason here, so you can verify your understanding

    bad_vector_new() The vector is created on the stack, instead of the heap. All memory stored on the stack gets freed as soon as that function finishes running, so when the function returns, we lose the vector we constructed.
  2. Fill in the functions vector_new(), vector_get(), vector_delete(), and vector_set() in vector.c so that our test code test_vector.c runs without any memory management errors.

    Comments in the code describe how the functions should work. Look at the functions we've filled in to see how the data structures should be used. For consistency, it is assumed that all entries in the vector are 0 unless set by the user. Keep this in mind as malloc() does not zero out the memory it allocates.

Test your implementation of vector_new(), vector_get(), vector_delete(), and vector_set() for both correctness and memory management (details below).

# 1) to check correctness
make vector
./vector

# 2) to check memory management using Valgrind:
valgrind ./vector

Any number of suppressed errors is fine; they do not affect us.

Feel free to also use CGDB to debug your code.


Exercise 5

Please fill out this short survey about your experience with the lab. Your responses will be used to improve the lab in the future. The survey will be collecting your email to verify that you have submitted it, but your responses will be anonymized when the data is analyzed. Thank you!


Submission

Save, commit, and push your work, then submit to the Lab 2 assignment on Gradescope.