Skip to content
This repository has been archived by the owner on Feb 20, 2023. It is now read-only.

Coding Guideline

Tianyu Li edited this page Jul 31, 2018 · 29 revisions

Directory Structure

In the project root directory, there are three major places where you will put code:

  • src -- This is where the bulk of the code for lives. Anything you expect to be compiled into the release should be here.
  • test -- This is where unit tests, benchmarks, and utility code for them lives. src should not have dependency going into test.
  • script -- Where scripts that support development and testing lives. (e.g. python formatting script, dependency installation).

Almost never will you need to create new directories outside of these. Here are guidelines for adding files to each.

src


There can be at most 2-levels of directories under src, the first level will be general system components (e.g. storage, execution, network, sql, common), and the second level will be either for a class of similar files, or for a self-contained sub-component.

Translated into coding guidelines, you should rarely need to create a new first-level subdirectory, and should probably consult Andy if you believe you do. To create a new secondary directory, make sure you meet the following criteria:

  • There are more than 2 (exclusive) files you need to put into this folder
  • Each file is stand-alone, i.e. either the contents don't make sense living in a single file, or that putting them in a single file makes the file large and difficult to navigate. (This is open to interpretation, but if, for example, you have 3 files containing 10-line class definitions, maybe they should not be spread out that much).

And one of the two:

  • The subdirectory is a self-contained sub-component. This probably means that the folder only has one outward facing API. A good rule of thumb is when outside code files only need to include one header from this folder, where said API is defined.
  • The subdirectory contains a logical grouping of files, and there are enough of them that leaving them ungrouped makes the upper level hard to navigate. (e.g. all the plans, all the common data structures, etc.) A good rule of thumb is if you have subdirectory As, you should be able to say with a straight face that everything under As is an A. (e.g. Everything under containers is a container)

Every class and/or function under these directories should be in namespaces. All code will be under namespace terrier, and namespace the same as their first-level directory name (e.g common, storage). Secondary sub-directories do not have associated namespaces.

test


TBD

script


TBD

Dependency Injection

Singletons

Ask us how we know, but Singletons, and hard-coded dependencies in general, do wonders to make a codebase hard to understand, hard to test, and hard to maintain.

The solution is to use a specific style of coding to avoid having to deal with this. Dependency Injection is a widely used paradigm in industry for this problem, and it is easier than it sounds.

Although we don't have the need for a full DI framework yet, we should strive to write code in a one amenable to future changes. Read the article linked to above if you are interested, but otherwise, here is a quick example of how these things work. Suppose we want to write a LinkedList, but linked list nodes will be reused:

struct LinkedListNode {
 // ... 
};

class LinkedList {
  // ...
  template <typename T>
  void Add(T content) {
    // ...
    LinkedListNodeObjectPool::GetInstance().New(content);
  }
};

At this point, it is hiding the fact that it mutates the state of the global singleton LinkedListNodeObjectPool, and because we made an explicitly call to it, preventing modularized testing. Now consider a scenario where for some reason LinkedListNodeObjectPool hands out large chunks of memory (say, 1 MB each node), and we want to do scale testing on the operations of LinkedList (e.g. Insert, Delete concurrently), our test will have to run slowly because there is no way for us to change the LinkedListNodeObjectPool's memory behavior without changing the code. Whereas in an ideal world, we know that in this test, it doesn't matter what the content of these LinkedListNode are, and can get away with a fake object that just has the pointer field.

Suppose we have written the code instead in this way:

class LinkedList {
  LinkedList(LinkedListNodeObjectPool &pool) : pool_(pool) {}
  // ...
  template <typename T>
  void Add(T content) {
    // ...
    pool_.New(content);
  }

  //...
  LinkedListNodeObjectPool &pool_;
};

Everything still works. But now suppose we want to write the above test, we simply do:

// fake implementation with low memory overhead
class FakeLinkedListNodeObjectPool : public LinkedListNodeObjectPool {
  // ...
}

TEST(LinkedListTests, LargeTest) {
  FakeLinkedListNodeObjectPool fake_pool;
  LinkedList tested(fake_pool);
  // test on tested
}

When we execute the real program, presumably in main.cpp:

int main() {
  // ...
  LinkedListNodeObjectPool real_pool;
  LinkedList tested(real_pool);
  // ...
}

To sum it up, dependency injection states that object creation and the logic be separated. This allows us to change the object without changing the logic. Of course, in practice, there is no need to be as strict about this; it might make sense, for example, for objects to create and own objects when the owned object is essential to functionality (e.g. rarely do we want to change a std::vector implementation, so creating a vector member in the object is fine). However, if the object in question logically is a different component, and it might make sense for the logic to be tested apart from the object, write the code in this way.

Code reuse

Copy and Pasting Code

If you are on this project, we should not need to explain to you why copy-and-pasting code is a bad idea. That being said, when deadlines start approaching, it is normal for people to start cutting corners.

If you ever find yourself in the situation where you are about to copy and paste a body of code (more than a couple of lines), stop immediately and ask yourself the following questions:

  • Is it the only time I will need to do it?
  • Am I the only person who will need to do it?
  • Is there no higher level abstraction for the block as whole?

If you answer no to any of the questions (chances are you will), stop and write a function for said block of code, and leave enough documentation and extension points such that nobody will need to revisit this process.

If you do not do so, chances are your code reviewer's eyes will bleed and they will ask you to do so. You are not cutting corners after all and you are causing pain to both yourself and your fellow students (professors).