Inside the Bytes: Dynamically Sized Arrays From The Ground Up!



In the last article we discussed memory representation of primitive data structures in C. We also went over the building blocks of abstraction fixed sized arrays, and structs. https://hamdaan-rails-personal.herokuapp.com/articles/28

In this article we will use the previously discussed concepts to build up a commonly used data structures from the ground up, Lists (Dynamically sized arrays).



Lists/Dynamically Sized Arrays


C standard library provides us with the primitive data structure of an array. An array is a contiguous block of memory of fixed size that stores the same type of data. This fixed size is provided during initialization of the array, and cannot be changed once initialized. This is useful when you know how big of an array you need, but what happens when the size of your array is known to you only at runtime? It expands, and it shrinks. For this use case, we need the abstraction of a dynamically sized array. This is provided in higher level languages by the standard library such as the list in C, and ArrayList in Java. The C standard library does not provide such a data structure though, we need to build one.


For our DynamicArray data structure we are going to rely on structs, and arrays. We will expose a limited set of functions as our API that let a user interact with this abstract data structure without knowing the details of how it works. The Api functions will be:


struct DynamicArray {
    void** data; // Heap allocated array of data
    int capacity; // metadata to work with data dynamically
    int occupied; // number of slots in data array that are actually occupied
    unsigned int sizeOfElement; // how many bytes is each element we are actually storing
};


// Initializer: sets up the internal array and initial values. The arr parameter is modified after being
// passed as input. This lets user decide whether to allocate the arr on heap or stack.
int dynamicArrayNew(struct DynamicArray* arr, unsigned int sizeOfElement);


// copies data pointed to by element into the first available index in the array. 
// The invoker may choose to dispose off the element since the data has been copied over.
// If there is data being copied over that is a pointer, we copy over the address. 
int dynamicArrayAddElement(struct DynamicArray* arr, void* element);


// Go to the index, delete the item, and accordingly move the elements leftwards as needed.
void dynamicArrayRemoveByIdx(struct DynamicArray* arr, int index);


// Perform a side effect user defined function on each of the elements
void dynamicArrayForEach(struct DynamicArray* arr, void (*func)(void*));


The way the dynamic array is meant to work is that we maintain metadata of how the capacity of the internal data array, the number of slots occupied and we hold the array in the struct. Upon initialization we allocate space for this array on the heap. We allocate the internal data array on the heap since we know it is going to increase and decrease in size during runtime, and we want flexibility in how much data we allocate for this array. Initially we choose to create an array of N items where N is an arbitrary number that denotes the available capacity. When creating a brand new array we know we have 0 spaces occupied. The allocation for the data array is performed by calling malloc, and multiplying the the size of an address * N, this is because we want to essentially create an array of pointers where each pointer points to a different block in memory that holds the data. We chose to create an array of pointers instead of an array of "int" or "char" so we could have an array store any type of data (bypassing typing). Well if can store any kind of data at the pointer on each index of the array how do we know how much space to allocate for this data? For that we maintain metadata of how many bytes does the data take. During initialization the user is expected to pass the size of the data they want to store in the array, for example a user may pass => sizeof(struct Node). This size of data is stored as metadata in the struct and used when we actually want to add items to the array. That takes care of the initialization, the code below will show the above logic.


int dynamicArrayNew(struct DynamicArray* arr, unsigned int sizeOfElement) {
    arr->sizeOfElement = sizeOfElement;
    arr->occupied = 0;
    arr->capacity = 8;
    arr->data = malloc(arr->capacity * sizeof(void*));
	EXIT_IF_MALLOC_FAIL(arr->data); // macro that prints an error and returns -1

	return 0;
}



Next we move on to being able to add items to our dynamic array. Adding items is not as straightforward. The user passes in the dynamic array struct and a pointer to the data they want to store. We already know the size of the data the user wants to store since that is contained in the metadata. We trust that the user will not pass in pointer to data that is not of the size the user specified during initialization, this will cause a segmentation fault.

During addition of an item there are 2 scenarios we can run into. In the first scenario we have an open spot for our data in the internal array, and the second scenario is that we are at max occupancy. This is figured out by the metadata fields of occupied and capacity. If occupied == capacity then we enter scenario two, else we are in scenario one.

In the first scenario since we have enough space, we can just add the element to the first unoccupied slot in our array. We do so by indexing into the first unoccupied spot in the array, which will be at index "occupied", we already have space allocated here to store a pointer, so we create space on the heap using malloc of the data size mentioned in the metadata in the struct, we then copy over the bytes from the user passed element into this new space on the heap, and store the pointer to this stored data at the index for occupied, we then increment the occupied int on the struct to mark that we have one more slot occupied.

In the second scenario, we have already performed the first scenario multiple times and our entire capacity is occupied. At this point we need to perform what is known as resizing. We allocate twice as many slots as our current internal array has and then move our data into the newly spacious double of previous size array. We then mark the metadata to indicate we have doubled our capacity and we have enough occupancies for another element to be added. At this point we add the element as described in the first scenario to the new spacious array.


int dynamicArrayAddElement(struct DynamicArray* arr, void* element) {
    if (arr->occupied == arr->capacity) {
        void** newData = realloc(arr->data, arr->capacity * 2 * sizeof(void*));
		EXIT_IF_MALLOC_FAIL(newData);


        arr->data = newData;
        arr->capacity *= 2;
    }


    arr->data[arr->occupied] = malloc(arr->sizeOfElement);
	EXIT_IF_MALLOC_FAIL(arr->data[arr->occupied]);


    memcpy(arr->data[arr->occupied], element, arr->sizeOfElement);


    arr->occupied++;


    return 0;
}



Next we explore how we can remove items from the array based on the index. This is done by the method dynamicArrayRemoveByIdx, this method takes the dynamicArray struct and an index as parameters and removes the item stored at that index. After removing the items at that index all the elements to the right of the index for removal are shifted one slot left in the internal array. Removal of the actual element is made possible by calling free on the address of the index in the internal data array. When we remove items from the array we also have to account for what happens when we have a huge capacity, but the number of slots occupied is very low, in this case we have a lot of memory we have reserved that is unused. To tackle reserving unused memories, we resize the internal array down. We keep a track of the metadata in the struct. All this is shown in the function below.


void dynamicArrayRemoveByIdx(struct DynamicArray* arr, int index) {
    // range check
    if (index < 0 || index >= arr->occupied) {
        printf("Invalid index\n");
        return;
    }

    // actually remove the element from the memory
    free(arr->data[index]);


    // Move elements after the removed element one slot left
    for (int i = index; i < arr->occupied - 1; i++) {
        arr->data[i] = arr->data[i + 1];
    }

    // manage the metadata to reflect the operation
    arr->occupied--;


    // Check if capacity needs to be reduced
    if (arr->occupied <= arr->capacity / 4) {
        void** newData = realloc(arr->data, (arr->capacity / 2) * sizeof(void*));
        if (newData) {
            arr->data = newData;
            arr->capacity /= 2;
        }
    }
}



The api also provides a higher order function that lets users perform a side-effect on the data in the dynamic array. This method purely depends on the metadata in the struct to iterate over the internal data array and call the user passed function.


void dynamicArrayForEach(struct DynamicArray* arr, void (*func)(void*)) {
    for (int i = 0; i < arr->occupied; i++) {
        printf("Element %d: ", i);
        func(arr->data[i]);
    }
}



Now with the API complete we can see the demonstration of this data structure below:


// print util we use in the forEach when data we are storing is an int
void printInt(void* element) {
    int* numPtr = (int*)element;
    printf("%d\n", *numPtr);
}

// print util we use in the forEach when data we are storing is a string
void printString(void* element) {
    char* str = (char*)element;
    printf("%s\n", str);
}


int main() {
    struct DynamicArray arr;
	
	int init = dynamicArrayNew(&arr, sizeof (int)); // initialize our data structure
	if (init == -1) {
		return 1;
	}

    // since our dynamic array takes pointers to data, we need to store the data as a variable
    int num1 = 42;
    int num2 = 78;
    int num3 = 13;


    dynamicArrayAddElement(&arr, &num1); 
    dynamicArrayAddElement(&arr, &num2); 
    dynamicArrayAddElement(&arr, &num3);

    // call the functional forEach with passing in the util
    dynamicArrayForEach(&arr, printInt);


    // Free memory after we are done using the data structure
    for (int i = 0; i < arr.occupied; i++) {
        free(arr.data[i]);
    }
    free(arr.data);


	// Now with strings
	char* test[] = {"abc", "def" , "jkl", "mno", "pqr", "stu", "vwx", "yz1", "234",
		"567", "890", "001", "002", "003", "004"};
	
    struct DynamicArray strArr;
	int initStrArr = dynamicArrayNew(&strArr, strlen(test[0]) + 1);
	if (initStrArr == -1) {
		return 1;
	}
	
	for (int i = 0; i < 15; i++) {
		dynamicArrayAddElement(&strArr, test[i]);
	}


    dynamicArrayForEach(&strArr, printString);
	
	// remove the 5th, 8th, 10th, 1st elements
	int removals[5] = {2, 1, 4, 6, 10};
	for (int i = 0; i < 5; i++) {
		dynamicArrayRemoveByIdx(&strArr, removals[i]);
	}


	dynamicArrayForEach(&strArr, printString);


	// Free memory
    for (int i = 0; i < strArr.occupied; i++) {
        free(strArr.data[i]);
    }
    free(strArr.data);


	printf("Finished Successfully \n");
    return 0;
}



Implementing a dynamically sized list clearly showcases the run time complexity and memory complexity of the different APIs, we see why certain access patterns such as addition can be performed in constant time thanks to pointer arithmetic.


In the next article we will be going over the next common data structure a HashTable!


Comments

Commenter: Hamdaan Khalid

ah frick I forgot to add how to get an item at an index. The get is a constant time operation we can perform thanks to being able to access the data in the array via indexing. Adding the code here: void* dynamicArrayGetItemAtIdx(struct DynamicArray* arr, int index) { if (index < 0 || index >= arr->occupied) { printf("Invalid index\n"); return NULL; } return arr->data[index]; }

Add a comment: