Tuesday, February 14, 2017

Large Scale C++ Projects

C++ continues to be one of the preferred programming languages to develop professional applications and there is much information that documents C++ coding standards and how to write Effective C++.

However, it seems rare to find information that documents how to use the C++ language effectively;
That is: write C++ in such a way that minimizes cyclic, link and compile-time dependencies in C++.

Objective
The objective of this post is to gain insight into the design of large, high-quality C++ software systems:
Create highly maintainable + testable software that uses increasing number of classes and lines of code.

Reference
The majority of information here can be found in Large-Scale C++ Software Design book by John Lakos.

Definitions
 Declaration  Introduces a name into a program
 Definition  Provides a unique description of an entity within a program
 Translation Unit  Single implementation file that includes all header files
 Internal Linkage  Name local to translation unit and cannot collide with identical name at link time
 External Linkage  In a multi-file program a name can interact with other translation units at link time

Issues
Poorly written C++ is difficult to understand + maintain. As programs get large the following issues occur:

Cyclic Dependencies
C++ components have tendency to get tangled up in each other which results in tightly physical coupling;
e.g. 2x components Rectangle and Window: Rectangle uses the Window and Window uses the Rectangle.
 // rectangle.h
 #ifndef _INCLUDED_RECTANGLE
 #define _INCLUDED_RECTANGLE

 class Window;

 class Rectangle
 {
 public:
     Rectangle(const Window& w);
 };
 #endif//_INCLUDED_RECTANGLE
 // window.h
 #ifndef _INCLUDED_WINDOW
 #define _INCLUDED_WINDOW

 class Rectangle;

 class Window
 {
 public:
     Window(const Rectangle& r);
 };
 #endif//_INCLUDED_WINDOW
Now, both objects in the system must know about each other and #include the other object's header file.
Therefore, it is now not possible to compile, link, test and use one component without the using other!

Excessive Link-Time
Cyclic dependencies make it necessary to link most dependent objects in order to test any of them at all.
Not only does this increase the executable size but can make the linking process unduly slow and painful!

Excessive Compile-Time
When developing C++ programs, changes to a single header file can sometimes cause many translation units to recompile. Why? Because the unnecessary inclusion of one header file by another is a common source of excessive coupling in C++.

Ignoring compile-time dependencies can cause translation units to unnecessarily include header files.
This results in excessive compile-time dependencies which can reduce compilation speeds to a crawl!

Poor Physical Design
This addresses design issues surrounding the physical entities of a system e.g. files, directories, libraries as well as organizational issues e.g. compile-time and link-time dependencies between physical entities.

Good physical design implications often dictate the outcome of logical design decisions;
Therefore avoid logical design choices that would imply cyclic physical dependencies.

Testing Challenges
Complex, well-designed software systems are built from layered parts that should have been tested thoroughly in isolation, then within a sequence partial subsystems, and as a fully integrated product.

Therefore, it must be possible to decompose the entire system into its single units of functionality.

Summary
Cyclic link-time dependencies among translation units can undermine understanding, testing, reusability. Unnecessary or excessive compile-time dependencies increase compilation cost + destroy maintainability.

Most C++ design books address only the logical design issues of project (classes, functions, inheritance).
However, quality of physical design often dictates improved outcome of many logical design decisions!

Techniques
Two important techniques are discussed to mitigate cyclic dependencies in C++:
 Levelization  Techniques used to help reduce link-time dependencies
 Insulation  Techniques used to minimize compile-time dependencies


Levelization
Levelization refers to the technique for reducing the link-time dependencies within a system.
Therefore, a system is levelizable when there are no cyclic dependencies, that is, acyclic.

Link-time dependencies within a system establish the overall physical quality of a system;
e.g. understandability, maintainability, testability and reusability tied to a physical design.

The following sections list techniques to help reduce link-time dependencies:

Escalation
Refer back to the original example of 2x mutually dependent objects: Rectangle and Window.
Move the cyclic functionality into a higher level component; a technique called escalation.

If peer components are dependent then it may be possible to escalate interdependent functionality;
Each component becomes static members in a higher-level component that depends on the original.
 // boxutil.h
 class Rectangle;
 class Window;
 
 class BoxUtil
 {
 public:
     static Rectangle toRectangle(const Window& w);
     static Window toWindow(const Rectangle& r);
 };

Demotion
Move the common functionality to lower levels of the physical hierarchy; a technique called demotion.
The new lower-level component is shared upon each of which the original component(s) depends.

Opaque Pointers
In order to compile a function, the compiler needs to know the size and layout of the object it uses.
The compiler does this by #include the header file of the component containing the class definition.

A function f uses a type T in size if compiling the body of f requires having first seen definition of T.
A function f uses a type T in name only if compiling f does not require having seen definition of T.

A pointer is opaque if the type definition to which it points is not included in current translation unit.
This technique is sometimes referred to as the Pointer to Implementation idiom or the Pimpl idiom.
 // handle.h
 class Foo;
 
 class Handle
 {
 public:
     Handle(Foo* foo) : p_opaque(foo) {}
 
     Foo* get() const { return p_opaque; }
     void set(Foo* foo) { p_opaque = foo; }
 
 private:
     Foo* p_opaque;
 };

Dumb Data
Any kind of information that an object holds but does not know how to interpret.
Such data must be used in the context of another object, at a higher level.

E.g. rather than identifying objects by opaque pointer, we may use a short integer index instead. However, an indexed approach does sacrifice type safety when compared to opaque pointers.

Redundancy
This technique deliberately repeats code or data to avoid or remove unwanted physical dependencies.
The additional coupling associated with some forms may outweigh the advantage gained from reuse.

Callbacks
A function supplied by a client to allow lower-level component to take advantage of higher-level context. Callbacks are useful in event-based programming to break dependencies between co-operating classes.

However, when used inappropriately, callbacks can blur the responsibility of the lower-level objects.
This can result in unnecessary conceptual coupling that is difficult to understand, maintain + debug.

Factoring
Extracting pockets of cohesive functionality: move them where they can be independently tested and reused; General technique for reducing the burden imposed by cyclicly dependent classes. Factoring:
  • A system into smaller components: more flexible yet more complex pieces to work
  • A concrete class into two classes containing higher and lower levels of functionality
  • An abstract base class into 2x classes: pure interface and partial implementation

Escalating Encapsulation
A common misconception is that each component must encapsulate all implementation details.
Instead we can hide a number of useful low-level classes behind single component: Wrapper.
Note: Wrapper is a component-based implementation of the Façade design pattern.

Summary
By proactively engineering the system as a levelizable collection of components creates a hierarchy of modular abstractions that can be understood, tested, reused independently from the rest of the design.

Techniques
 Escalation  Move mutually dependent functionality higher in physical hierarchy
 Demotion  Move mutually dependent functionality lower in physical hierarchy
 Opaque Pointers  Having an object use another in name only [Pimpl idiom]
 Dumb Data  Use data that indicates dependency on peer object in higher object context
 Redundancy  Deliberately avoid reuse: repeat small amounts of code to avoid coupling
 Callbacks  Client-supplied functions enable lower-level subsystems to perform tasks
 Factoring  Move independent testable behaviour from implementation avoids coupling
 Escalating Encapsulation  Implementation hidden from clients to higher level in physical hierarchy

Conclusion
In conclusion, this post discussed Levelization techniques to help reduce link-time dependencies in Large Scale C++ Projects. Next we will discuss Insulation techniques to help reduce compile-time dependencies.

This will be the topic in the next post.