MindView Inc.
[ Viewing Hints ] [ Exercise Solutions ] [ Volume 2 ] [ Free Newsletter ]
[ Seminars ] [ Seminars on CD ROM ] [ Consulting ]

Thinking in C++, 2nd ed. Volume 1

©2000 by Bruce Eckel

[ Previous Chapter ] [ Table of Contents ] [ Index ] [ Next Chapter ]

5: Hiding the Implementation

A typical C library contains a struct and some
associated functions to act on that struct. So far,
you've seen how C++ takes functions that are conceptually associated and makes them literally associated by

putting the function declarations inside the scope of the struct, changing the way functions are called for the struct, eliminating the passing of the structure address as the first argument, and adding a new type name to the program (so you don’t have to create a typedef for the struct tag).

These are all convenient – they help you organize your code and make it easier to write and read. However, there are other important issues when making libraries easier in C++, especially the issues of safety and control. This chapter looks at the subject of boundaries in structures.

Setting limits

In any relationship it’s important to have boundaries that are respected by all parties involved. When you create a library, you establish a relationship with the client programmer who uses that library to build an application or another library.

In a C struct, as with most things in C, there are no rules. Client programmers can do anything they want with that struct, and there’s no way to force any particular behaviors. For example, even though you saw in the last chapter the importance of the functions named initialize( ) and cleanup( ), the client programmer has the option not to call those functions. (We’ll look at a better approach in the next chapter.) And even though you would really prefer that the client programmer not directly manipulate some of the members of your struct, in C there’s no way to prevent it. Everything’s naked to the world.

There are two reasons for controlling access to members. The first is to keep the client programmer’s hands off tools they shouldn’t touch, tools that are necessary for the internal machinations of the data type, but not part of the interface the client programmer needs to solve their particular problems. This is actually a service to client programmers because they can easily see what’s important to them and what they can ignore.

The second reason for access control is to allow the library designer to change the internal workings of the structure without worrying about how it will affect the client programmer. In the Stack example in the last chapter, you might want to allocate the storage in big chunks, for speed, rather than creating new storage each time an element is added. If the interface and implementation are clearly separated and protected, you can accomplish this and require only a relink by the client programmer.

C++ access control

C++ introduces three new keywords to set the boundaries in a structure: public, private, and protected. Their use and meaning are remarkably straightforward. These access specifiers are used only in a structure declaration, and they change the boundary for all the declarations that follow them. Whenever you use an access specifier, it must be followed by a colon.

public means all member declarations that follow are available to everyone. public members are like struct members. For example, the following struct declarations are identical:

//: C05:Public.cpp
// Public is just like C's struct

struct A {
  int i;
  char j;
  float f;
  void func();
};

void A::func() {}

struct B {
public:
  int i;
  char j;
  float f;
  void func();
};

void B::func() {}  

int main() {
  A a; B b;
  a.i = b.i = 1;
  a.j = b.j = 'c';
  a.f = b.f = 3.14159;
  a.func();
  b.func();
} ///:~

The private keyword, on the other hand, means that no one can access that member except you, the creator of the type, inside function members of that type. private is a brick wall between you and the client programmer; if someone tries to access a private member, they’ll get a compile-time error. In struct B in the example above, you may want to make portions of the representation (that is, the data members) hidden, accessible only to you:

//: C05:Private.cpp
// Setting the boundary

struct B {
private:
  char j;
  float f;
public:
  int i;
  void func();
};

void B::func() {
  i = 0;
  j = '0';
  f = 0.0;
};

int main() {
  B b;
  b.i = 1;    // OK, public
//!  b.j = '1';  // Illegal, private
//!  b.f = 1.0;  // Illegal, private
} ///:~

Although func( ) can access any member of B (because func( ) is a member of B, thus automatically granting it permission), an ordinary global function like main( ) cannot. Of course, neither can member functions of other structures. Only the functions that are clearly stated in the structure declaration (the “contract”) can have access to private members.

There is no required order for access specifiers, and they may appear more than once. They affect all the members declared after them and before the next access specifier.

protected

The last access specifier is protected. protected acts just like private, with one exception that we can’t really talk about right now: “Inherited” structures (which cannot access private members) are granted access to protected members. This will become clearer in Chapter 14 when inheritance is introduced. For current purposes, consider protected to be just like private.

Friends

What if you want to explicitly grant access to a function that isn’t a member of the current structure? This is accomplished by declaring that function a friend inside the structure declaration. It’s important that the friend declaration occurs inside the structure declaration because you (and the compiler) must be able to read the structure declaration and see every rule about the size and behavior of that data type. And a very important rule in any relationship is, “Who can access my private implementation?”

The class controls which code has access to its members. There’s no magic way to “break in” from the outside if you aren’t a friend; you can’t declare a new class and say, “Hi, I’m a friend of Bob!” and expect to see the private and protected members of Bob.

You can declare a global function as a friend, and you can also declare a member function of another structure, or even an entire structure, as a friend. Here’s an example :

//: C05:Friend.cpp
// Friend allows special access

// Declaration (incomplete type specification):
struct X;

struct Y {
  void f(X*);
};

struct X { // Definition
private:
  int i;
public:
  void initialize();
  friend void g(X*, int); // Global friend
  friend void Y::f(X*);  // Struct member friend
  friend struct Z; // Entire struct is a friend
  friend void h();
};

void X::initialize() { 
  i = 0; 
}

void g(X* x, int i) { 
  x->i = i; 
}

void Y::f(X* x) { 
  x->i = 47; 
}

struct Z {
private:
  int j;
public:
  void initialize();
  void g(X* x);
};

void Z::initialize() { 
  j = 99;
}

void Z::g(X* x) { 
  x->i += j; 
}

void h() {
  X x;
  x.i = 100; // Direct data manipulation
}

int main() {
  X x;
  Z z;
  z.g(&x);
} ///:~

struct Y has a member function f( ) that will modify an object of type X. This is a bit of a conundrum because the C++ compiler requires you to declare everything before you can refer to it, so struct Y must be declared before its member Y::f(X*) can be declared as a friend in struct X. But for Y::f(X*) to be declared, struct X must be declared first!

Here’s the solution. Notice that Y::f(X*) takes the address of an X object. This is critical because the compiler always knows how to pass an address, which is of a fixed size regardless of the object being passed, even if it doesn’t have full information about the size of the type. If you try to pass the whole object, however, the compiler must see the entire structure definition of X, to know the size and how to pass it, before it allows you to declare a function such as Y::g(X).

By passing the address of an X, the compiler allows you to make an incomplete type specification of X prior to declaring Y::f(X*). This is accomplished in the declaration:

struct X;

This declaration simply tells the compiler there’s a struct by that name, so it’s OK to refer to it as long as you don’t require any more knowledge than the name.

Now, in struct X, the function Y::f(X*) can be declared as a friend with no problem. If you tried to declare it before the compiler had seen the full specification for Y, it would have given you an error. This is a safety feature to ensure consistency and eliminate bugs.

Notice the two other friend functions. The first declares an ordinary global function g( ) as a friend. But g( ) has not been previously declared at the global scope! It turns out that friend can be used this way to simultaneously declare the function and give it friend status. This extends to entire structures:

friend struct Z;

is an incomplete type specification for Z, and it gives the entire structure friend status.

Nested friends

Making a structure nested doesn’t automatically give it access to private members. To accomplish this, you must follow a particular form: first, declare (without defining) the nested structure, then declare it as a friend, and finally define the structure. The structure definition must be separate from the friend declaration, otherwise it would be seen by the compiler as a non-member. Here’s an example:

//: C05:NestFriend.cpp
// Nested friends
#include <iostream>
#include <cstring> // memset()
using namespace std;
const int sz = 20;

struct Holder {
private:
  int a[sz];
public:
  void initialize();
  struct Pointer;
  friend struct Pointer;
  struct Pointer {
  private:
    Holder* h;
    int* p;
  public:
    void initialize(Holder* h);
    // Move around in the array:
    void next();
    void previous();
    void top();
    void end();
    // Access values:
    int read();
    void set(int i);
  };
};

void Holder::initialize() {
  memset(a, 0, sz * sizeof(int));
}

void Holder::Pointer::initialize(Holder* rv) {
  h = rv;
  p = rv->a;
}

void Holder::Pointer::next() {
  if(p < &(h->a[sz - 1])) p++;
}

void Holder::Pointer::previous() {
  if(p > &(h->a[0])) p--;
}

void Holder::Pointer::top() {
  p = &(h->a[0]);
}

void Holder::Pointer::end() {
  p = &(h->a[sz - 1]);
}

int Holder::Pointer::read() {
  return *p;
}

void Holder::Pointer::set(int i) {
  *p = i;
}

int main() {
  Holder h;
  Holder::Pointer hp, hp2;
  int i;

  h.initialize();
  hp.initialize(&h);
  hp2.initialize(&h);
  for(i = 0; i < sz; i++) {
    hp.set(i);
    hp.next();
  }
  hp.top();
  hp2.end();
  for(i = 0; i < sz; i++) {
    cout << "hp = " << hp.read()
         << ", hp2 = " << hp2.read() << endl;
    hp.next();
    hp2.previous();
  }
} ///:~

Once Pointer is declared, it is granted access to the private members of Holder by saying:

friend struct Pointer;

The struct Holder contains an array of ints and the Pointer allows you to access them. Because Pointer is strongly associated with Holder, it’s sensible to make it a member structure of Holder. But because Pointer is a separate class from Holder, you can make more than one of them in main( ) and use them to select different parts of the array. Pointer is a structure instead of a raw C pointer, so you can guarantee that it will always safely point inside the Holder.

The Standard C library function memset( ) (in <cstring>) is used for convenience in the program above. It sets all memory starting at a particular address (the first argument) to a particular value (the second argument) for n bytes past the starting address (n is the third argument). Of course, you could have simply used a loop to iterate through all the memory, but memset( ) is available, well-tested (so it’s less likely you’ll introduce an error), and probably more efficient than if you coded it by hand.

Is it pure?

The class definition gives you an audit trail, so you can see from looking at the class which functions have permission to modify the private parts of the class. If a function is a friend, it means that it isn’t a member, but you want to give permission to modify private data anyway, and it must be listed in the class definition so everyone can see that it’s one of the privileged functions.

C++ is a hybrid object-oriented language, not a pure one, and friend was added to get around practical problems that crop up. It’s fine to point out that this makes the language less “pure,” because C++ is designed to be pragmatic, not to aspire to an abstract ideal.

Object layout

Chapter 4 stated that a struct written for a C compiler and later compiled with C++ would be unchanged. This referred primarily to the object layout of the struct, that is, where the storage for the individual variables is positioned in the memory allocated for the object. If the C++ compiler changed the layout of C structs, then any C code you wrote that inadvisably took advantage of knowledge of the positions of variables in the struct would break.

When you start using access specifiers, however, you’ve moved completely into the C++ realm, and things change a bit. Within a particular “access block” (a group of declarations delimited by access specifiers), the variables are guaranteed to be laid out contiguously, as in C. However, the access blocks may not appear in the object in the order that you declare them. Although the compiler will usually lay the blocks out exactly as you see them, there is no rule about it, because a particular machine architecture and/or operating environment may have explicit support for private and protected that might require those blocks to be placed in special memory locations. The language specification doesn’t want to restrict this kind of advantage.

Access specifiers are part of the structure and don’t affect the objects created from the structure. All of the access specification information disappears before the program is run; generally this happens during compilation. In a running program, objects become “regions of storage” and nothing more. If you really want to, you can break all the rules and access the memory directly, as you can in C. C++ is not designed to prevent you from doing unwise things. It just provides you with a much easier, highly desirable alternative.

In general, it’s not a good idea to depend on anything that’s implementation-specific when you’re writing a program. When you must have implementation-specific dependencies, encapsulate them inside a structure so that any porting changes are focused in one place.

The class

Access control is often referred to as implementation hiding. Including functions within structures (often referred to as encapsulation[36]) produces a data type with characteristics and behaviors, but access control puts boundaries within that data type, for two important reasons. The first is to establish what the client programmers can and can’t use. You can build your internal mechanisms into the structure without worrying that client programmers will think that these mechanisms are part of the interface they should be using.

This feeds directly into the second reason, which is to separate the interface from the implementation. If the structure is used in a set of programs, but the client programmers can’t do anything but send messages to the public interface, then you can change anything that’s private without requiring modifications to their code.

Encapsulation and access control, taken together, invent something more than a C struct. We’re now in the world of object-oriented programming, where a structure is describing a class of objects as you would describe a class of fishes or a class of birds: Any object belonging to this class will share these characteristics and behaviors. That’s what the structure declaration has become, a description of the way all objects of this type will look and act.

In the original OOP language, Simula-67, the keyword class was used to describe a new data type. This apparently inspired Stroustrup to choose the same keyword for C++, to emphasize that this was the focal point of the whole language: the creation of new data types that are more than just C structs with functions. This certainly seems like adequate justification for a new keyword.

However, the use of class in C++ comes close to being an unnecessary keyword. It’s identical to the struct keyword in absolutely every way except one: class defaults to private, whereas struct defaults to public. Here are two structures that produce the same result:

//: C05:Class.cpp
// Similarity of struct and class

struct A {
private:
  int i, j, k;
public:
  int f();
  void g();
};

int A::f() { 
  return i + j + k; 
}

void A::g() { 
  i = j = k = 0; 
}

// Identical results are produced with:

class B {
  int i, j, k;
public:
  int f();
  void g();
};

int B::f() { 
  return i + j + k; 
}

void B::g() { 
  i = j = k = 0; 
} 

int main() {
  A a;
  B b;
  a.f(); a.g();
  b.f(); b.g();
} ///:~

The class is the fundamental OOP concept in C++. It is one of the keywords that will not be set in bold in this book – it becomes annoying with a word repeated as often as “class.” The shift to classes is so important that I suspect Stroustrup’s preference would have been to throw struct out altogether, but the need for backwards compatibility with C wouldn’t allow that.

Many people prefer a style of creating classes that is more struct-like than class-like, because you override the “default-to-private” behavior of the class by starting out with public elements:

class X {
public:
  void interface_function();
private:
  void private_function();
  int internal_representation;
}; 

The logic behind this is that it makes more sense for the reader to see the members of interest first, then they can ignore anything that says private. Indeed, the only reasons all the other members must be declared in the class at all are so the compiler knows how big the objects are and can allocate them properly, and so it can guarantee consistency.

The examples in this book, however, will put the private members first, like this:

class X {
  void private_function();
  int internal_representation;
public:
  void interface_function();
}; 

Some people even go to the trouble of decorating their own private names:

class Y {
public:
  void f();
private:
  int mX;  // "Self-decorated" name
}; 

Because mX is already hidden in the scope of Y, the m (for “member”) is unnecessary. However, in projects with many global variables (something you should strive to avoid, but which is sometimes inevitable in existing projects), it is helpful to be able to distinguish inside a member function definition which data is global and which is a member.

Modifying Stash to use access control

It makes sense to take the examples from Chapter 4 and modify them to use classes and access control. Notice how the client programmer portion of the interface is now clearly distinguished, so there’s no possibility of client programmers accidentally manipulating a part of the class that they shouldn’t.

//: C05:Stash.h
// Converted to use access control
#ifndef STASH_H
#define STASH_H

class Stash {
  int size;      // Size of each space
  int quantity;  // Number of storage spaces
  int next;      // Next empty space
  // Dynamically allocated array of bytes:
  unsigned char* storage;
  void inflate(int increase);
public:
  void initialize(int size);
  void cleanup();
  int add(void* element);
  void* fetch(int index);
  int count();
};
#endif // STASH_H ///:~

The inflate( ) function has been made private because it is used only by the add( ) function and is thus part of the underlying implementation, not the interface. This means that, sometime later, you can change the underlying implementation to use a different system for memory management.

Other than the name of the include file, the header above is the only thing that’s been changed for this example. The implementation file and test file are the same.

Modifying Stack to use access control

As a second example, here’s the Stack turned into a class. Now the nested data structure is private, which is nice because it ensures that the client programmer will neither have to look at it nor be able to depend on the internal representation of the Stack:

//: C05:Stack2.h
// Nested structs via linked list
#ifndef STACK2_H
#define STACK2_H

class Stack {
  struct Link {
    void* data;
    Link* next;
    void initialize(void* dat, Link* nxt);
  }* head;
public:
  void initialize();
  void push(void* dat);
  void* peek();
  void* pop();
  void cleanup();
};
#endif // STACK2_H ///:~

As before, the implementation doesn’t change and so it is not repeated here. The test, too, is identical. The only thing that’s been changed is the robustness of the class interface. The real value of access control is to prevent you from crossing boundaries during development. In fact, the compiler is the only thing that knows about the protection level of class members. There is no access control information mangled into the member name that carries through to the linker. All the protection checking is done by the compiler; it has vanished by runtime.

Notice that the interface presented to the client programmer is now truly that of a push-down stack. It happens to be implemented as a linked list, but you can change that without affecting what the client programmer interacts with, or (more importantly) a single line of client code.


Handle classes

Access control in C++ allows you to separate interface from implementation, but the implementation hiding is only partial. The compiler must still see the declarations for all parts of an object in order to create and manipulate it properly. You could imagine a programming language that requires only the public interface of an object and allows the private implementation to be hidden, but C++ performs type checking statically (at compile time) as much as possible. This means that you’ll learn as early as possible if there’s an error. It also means that your program is more efficient. However, including the private implementation has two effects: the implementation is visible even if you can’t easily access it, and it can cause needless recompilation.

Hiding the implementation

Some projects cannot afford to have their implementation visible to the client programmer. It may show strategic information in a library header file that the company doesn’t want available to competitors. You may be working on a system where security is an issue – an encryption algorithm, for example – and you don’t want to expose any clues in a header file that might help people to crack the code. Or you may be putting your library in a “hostile” environment, where the programmers will directly access the private components anyway, using pointers and casting. In all these situations, it’s valuable to have the actual structure compiled inside an implementation file rather than exposed in a header file.

Reducing recompilation

The project manager in your programming environment will cause a recompilation of a file if that file is touched (that is, modified) or if another file it’s dependent upon – that is, an included header file – is touched. This means that any time you make a change to a class, whether it’s to the public interface or to the private member declarations, you’ll force a recompilation of anything that includes that header file. This is often referred to as the fragile base-class problem. For a large project in its early stages this can be very unwieldy because the underlying implementation may change often; if the project is very big, the time for compiles can prohibit rapid turnaround.

The technique to solve this is sometimes called handle classes or the “Cheshire cat”[37] – everything about the implementation disappears except for a single pointer, the “smile.” The pointer refers to a structure whose definition is in the implementation file along with all the member function definitions. Thus, as long as the interface is unchanged, the header file is untouched. The implementation can change at will, and only the implementation file needs to be recompiled and relinked with the project.

Here’s a simple example demonstrating the technique. The header file contains only the public interface and a single pointer of an incompletely specified class:

//: C05:Handle.h
// Handle classes
#ifndef HANDLE_H
#define HANDLE_H

class Handle {
  struct Cheshire; // Class declaration only
  Cheshire* smile;
public:
  void initialize();
  void cleanup();
  int read();
  void change(int);
};
#endif // HANDLE_H ///:~

This is all the client programmer is able to see. The line

struct Cheshire;

is an incomplete type specification or a class declaration (A class definition includes the body of the class.) It tells the compiler that Cheshire is a structure name, but it doesn’t give any details about the struct. This is only enough information to create a pointer to the struct; you can’t create an object until the structure body has been provided. In this technique, that structure body is hidden away in the implementation file:

//: C05:Handle.cpp {O}
// Handle implementation
#include "Handle.h"
#include "../require.h"

// Define Handle's implementation:
struct Handle::Cheshire {
  int i;
};

void Handle::initialize() {
  smile = new Cheshire;
  smile->i = 0;
}

void Handle::cleanup() {
  delete smile;
}

int Handle::read() {
  return smile->i;
}

void Handle::change(int x) {
  smile->i = x;
} ///:~

Cheshire is a nested structure, so it must be defined with scope resolution:

struct Handle::Cheshire {

In Handle::initialize( ), storage is allocated for a Cheshire structure, and in Handle::cleanup( ) this storage is released. This storage is used in lieu of all the data elements you’d normally put into the private section of the class. When you compile Handle.cpp, this structure definition is hidden away in the object file where no one can see it. If you change the elements of Cheshire, the only file that must be recompiled is Handle.cpp because the header file is untouched.

The use of Handle is like the use of any class: include the header, create objects, and send messages.

//: C05:UseHandle.cpp
//{L} Handle
// Use the Handle class
#include "Handle.h"

int main() {
  Handle u;
  u.initialize();
  u.read();
  u.change(1);
  u.cleanup();
} ///:~

The only thing the client programmer can access is the public interface, so as long as the implementation is the only thing that changes, the file above never needs recompilation. Thus, although this isn’t perfect implementation hiding, it’s a big improvement.

Summary

Access control in C++ gives valuable control to the creator of a class. The users of the class can clearly see exactly what they can use and what to ignore. More important, though, is the ability to ensure that no client programmer becomes dependent on any part of the underlying implementation of a class. If you know this as the creator of the class, you can change the underlying implementation with the knowledge that no client programmer will be affected by the changes because they can’t access that part of the class.

When you have the ability to change the underlying implementation, you can not only improve your design at some later time, but you also have the freedom to make mistakes. No matter how carefully you plan and design, you’ll make mistakes. Knowing that it’s relatively safe to make these mistakes means you’ll be more experimental, you’ll learn faster, and you’ll finish your project sooner.

The public interface to a class is what the client programmer does see, so that is the most important part of the class to get “right” during analysis and design. But even that allows you some leeway for change. If you don’t get the interface right the first time, you can add more functions, as long as you don’t remove any that client programmers have already used in their code.

Exercises

Solutions to selected exercises can be found in the electronic document The Thinking in C++ Annotated Solution Guide, available for a small fee from www.BruceEckel.com.

  1. Create a class with public, private, and protected data members and function members. Create an object of this class and see what kind of compiler messages you get when you try to access all the class members.
  2. Write a struct called Lib that contains three string objects a, b, and c. In main( ) create a Lib object called x and assign to x.a, x.b, and x.c. Print out the values. Now replace a, b, and c with an array of string s[3]. Show that your code in main( ) breaks as a result of the change. Now create a class called Libc, with private string objects a, b, and c, and member functions seta( ), geta( ), setb( ), getb( ), setc( ), and getc( ) to set and get the values. Write main( ) as before. Now change the private string objects a, b, and c to a private array of string s[3]. Show that the code in main( ) does not break as a result of the change.
  3. Create a class and a global friend function that manipulates the private data in the class.
  4. Write two classes, each of which has a member function that takes a pointer to an object of the other class. Create instances of both objects in main( ) and call the aforementioned member function in each class.
  5. Create three classes. The first class contains private data, and grants friendship to the entire second class and to a member function of the third class. In main( ), demonstrate that all of these work correctly.
  6. Create a Hen class. Inside this, nest a Nest class. Inside Nest, place an Egg class. Each class should have a display( ) member function. In main( ), create an instance of each class and call the display( ) function for each one.
  7. Modify Exercise 6 so that Nest and Egg each contain private data. Grant friendship to allow the enclosing classes access to this private data.
  8. Create a class with data members distributed among numerous public, private, and protected sections. Add a member function showMap( ) that prints the names of each of these data members and their addresses. If possible, compile and run this program on more than one compiler and/or computer and/or operating system to see if there are layout differences in the object.
  9. Copy the implementation and test files for Stash in Chapter 4 so that you can compile and test Stash.h in this chapter.
  10. Place objects of the Hen class from Exercise 6 in a Stash. Fetch them out and print them (if you have not already done so, you will need to add Hen::print( )).
  11. Copy the implementation and test files for Stack in Chapter 4 so that you can compile and test Stack2.h in this chapter.
  12. Place objects of the Hen class from Exercise 6 in a Stack. Fetch them out and print them (if you have not already done so, you will need to add Hen::print( )).
  13. Modify Cheshire in Handle.cpp, and verify that your project manager recompiles and relinks only this file, but doesn’t recompile UseHandle.cpp.
  14. Create a StackOfInt class (a stack that holds ints) using the “Cheshire cat” technique that hides the low-level data structure you use to store the elements in a class called StackImp. Implement two versions of StackImp: one that uses a fixed-length array of int, and one that uses a vector<int>. Have a preset maximum size for the stack so you don’t have to worry about expanding the array in the first version. Note that the StackOfInt.h class doesn’t have to change with StackImp.



[36] As noted before, sometimes access control is referred to as encapsulation.

[37] This name is attributed to John Carolan, one of the early pioneers in C++, and of course, Lewis Carroll. This technique can also be seen as a form of the “bridge” design pattern, described in Volume 2.

[ Previous Chapter ] [ Table of Contents ] [ Index ] [ Next Chapter ]
Last Update:09/27/2001