logo
 
Perl
CptS 355 - Programming Language Design
Washington State University
Home
Calendar
Syllabus
Resources
People
Project turn-in

Garbage Collection

Outline

The paper Advantages and Disadvantages of Conservative Garbage Collection which I handed out addresses many of the issues described here with emphasis on conservative collection. If you can't find your copy, here is the link: http://www.hpl.hp.com/personal/Hans_Boehm/gc/issues.html.

  • Why garbage collection?
    • Dangling references
    • Memory leaks
  • Main kinds of garbage collectors
    • Reference counting
    • Mark-and-sweep
    • Copying
    • Incremental
    • Conservative
  • Why garbage collection?

    Two big problems in programming with allocated memory:
    • dangling references where a pointer points to memory that has been de-allocated (and perhaps re-allocated for another purpose). If you're lucky using following a dangling pointer quickly leads immediately to a program crash. If you are unlucky following a dangling pointer leads to mysterious program behavior that may be extremely difficult to debug.
    • memory leaks where memory is allocated used for awhile and forgotten about without being freed. Memory leaks lead to programs whose memory use grows over time. They are usually much more of a problem in long-running programs such as operating systems, window managers, text editors, etc.
    Especially in code that involves concurrency and exceptions it can be extremely difficult to keep straight which piece of a program is responsible for de-allocating an object. One early work on GC claimed that in the large programs being developed in the Mesa language (which has concurrency and exceptions) over 40% of programmer effort was devoted to memory management.

    I programmed later in the Cedar language which succeeded Mesa and which had GC and memory management problems were hardly an issue in much of the programming that I did.

    Reference counting

    Note: some authors say that reference counting is not garbage collection, reserving that term for the other varieties below. This doesn't make any sense to me.

    Basic idea: each memory object contains a reference count field that is initialized to 1 when the object is created. When an existing pointer is assigned to a variable or field the reference count is incremented. When a field or variable is assigned to the reference count of the object that it previously pointed to is decremented. When the reference count reaches 0 it means that the program no longer can access the object so it can be collected and its space reused.

    The biggest problem with reference counting is that it cannot deal with cyclic data structures. The following code fragment creates a cyclic data structure. Work out the reference count on each object following each statement.

           /* assume appropriate node struct definition */
           p = new node;
           q = p->next = new node;
           r = q->next = new node;
           /* close the cycle */
           r->next = p;
           /* drop the references from outside */
           p = null;
           q = null;
           r = null;
    

    Mark-and-sweep collection

    Basic idea: each object contains a mark bit that is initialized to 0 when the object is allocated. To perform GC: starting from a root set consisting of all the global and static variables as well as variables on the call stack, mark all objects that can be recursively reached. When finished marking scan the heap (sweep) from beginning to end for unmarked objects: that fact that they are unmarked means that they are unreachable, hence they can be collected. Clear the mark bits of objects that are not collected to prepare for the next collection.

    The main problem with mark-and-sweep is that marking and sweeping may take a long time--many seconds for large memories with lots of reachable data. If virtual memory is in use M&S collection times can extend to minutes as the pages are swapped in and out on disk.

    Re-visit the code fragment above to see how M&S overcomes the difficulty that RC had with the cyclic structure.

    Copying collection

    Basic idea: when collecting move all reachable objects into a new memory area as they are encountered while following pointers. When finished all reachable objects are in the new memory area; the old memory area becomes the new memory area for the next collection. What I've called the old memory area is typically called the from space and the new memory area is typically called the to space. At the end of a collection the roles of from space and to space are swapped.

    Advantage of copying collection: copying collection solves the fragmentation problem which other collectors and even most manual memory management schemes do not address: as a program runs and allocates objects of different sizes memory may be chopped up into little pieces so that in aggregate there is a lot of free memory but none of the pieces is big enough to satisfy a program's allocation request. The copying makes all the allocated blocks be contiguous in to space so there is no fragmentation. However, note that at most half of memory can be used at once because the other half has to be available to serve as to space during collection. (With the large size of modern memories and careful design of how the allocator places objects fragmentation problems can usually be avoided anyway.)

    Copying collection can also be time-consuming but notice that unlike M&S which has to scan all of memory to find the mark bits, copying collection looks only at reachable memory. Also note that copying collection requires that all pointer values be updated during the copying because objects move.

    Incremental collection

    During a M&S or copying collection as described above the program cannot be doing useful work. These GC pauses are often so long that people find systems using these basic collectors unusable. To solve this problem incremental collection was invented. Incremental collection spreads out the work of the collector so it occurs in small chunks which can be interleaved with execution of the user program. Incremental collection is what makes large GC'd systems like Java and MS .Net usable for interactive programs.

    Conservative collection

    In tracing and copying collectors a key algorithmic step is following the pointers in reachable objects to find other reachable objects. How does the collector know where these pointers are? There are several possibilities such as tagging of pointers with special bit patterns or storing type codes in objects which allows the collector to find a pointer map for each object. A relatively recent technique is called conservative collection in which any bit pattern that might be a pointer is treated as if it is a pointer. The effect of this approximation is that some objects that are in fact unreachable may be treated as reachable. The only consequence of this is that there is less free memory than there might be -- but all reachable objects are retained.
                                                                                                                                                                                                                                                                                                                                                 
  (c) 2003 Curtis Dyreson, (c) 2004, 2005 Carl H. Hauser           E-mail questions or comments to Prof. Carl Hauser