C++內存池的設計與實現

一、概述

在 C/C++ 中，內存管理是一個非常棘手的問題，我們在編寫一個程序的時候幾乎不可避免的要遇到內存的分配邏輯，這時候隨之而來的有這樣一些問題：是否有足夠的內存可供分配?分配失敗了怎么辦? 如何管理自身的內存使用情況? 等等一系列問題。在一個高可用的軟件中，如果我們僅僅單純的向操作系統去申請內存，當出現內存不足時就退出軟件，是明顯不合理的。正確的思路應該是在內存不足的時，考慮如何管理并優化自身已經使用的內存，這樣才能使得軟件變得更加可用。本次項目我們將實現一個內存池，并使用一個棧結構來測試我們的內存池提供的分配性能。最終，我們要實現的內存池在棧結構中的性能，要遠高于使用 std::allocator 和 std::vector，如下圖所示：

項目涉及的知識點

C++ 中的內存分配器 std::allocator

內存池技術

手動實現模板鏈式棧
鏈式棧和列表棧的性能比較

內存池簡介

內存池是池化技術中的一種形式。通常我們在編寫程序的時候回使用 new delete 這些關鍵字來向操作系統申請內存，而這樣造成的后果就是每次申請內存和釋放內存的時候，都需要和操作系統的系統調用打交道，從堆中分配所需的內存。如果這樣的操作太過頻繁，就會找成大量的內存碎片進而降低內存的分配性能，甚至出現內存分配失敗的情況。

而內存池就是為了解決這個問題而產生的一種技術。從內存分配的概念上看，內存申請無非就是向內存分配方索要一個指針，當向操作系統申請內存時，

操作系統需要進行復雜的內存管理調度之后，才能正確的分配出一個相應的指針。而這個分配的過程中，我們還面臨著分配失敗的風險。

所以，每一次進行內存分配，就會消耗一次分配內存的時間，設這個時間為 T，那么進行 n 次分配總共消耗的時間就是 nT；如果我們一開始就確定好我們可能需要多少內存，那么在最初的時候就分配好這樣的一塊內存區域，當我們需要內存的時候，直接從這塊已經分配好的內存中使用即可，那么總共需要的分配時間僅僅只有 T。當 n 越大時，節約的時間就越多。

二、主函數設計

我們要設計實現一個高性能的內存池，那么自然避免不了需要對比已有的內存，而比較內存池對內存的分配性能，就需要實現一個需要對內存進行動態分配的結構（比如：鏈表棧），為此，可以寫出如下的代碼：

#include    // std::cout, std::endl#include     // assert()#include       // clock()#include      // std::vector
#include "MemoryPool.hpp"  // MemoryPool#include "StackAlloc.hpp"  // StackAlloc
// 插入元素個數#define ELEMS 10000000// 重復次數#define REPS 100
int main(){    clock_t start;
    // 使用 STL 默認分配器    StackAlloc<int, std::allocator<int> > stackDefault;    start = clock();    for (int j = 0; j < REPS; j++) {        assert(stackDefault.empty());        for (int i = 0; i < ELEMS; i++)          stackDefault.push(i);        for (int i = 0; i < ELEMS; i++)          stackDefault.pop();    }    std::cout << "Default Allocator Time: ";    std::cout << (((double)clock() - start) / CLOCKS_PER_SEC) << "

";
    // 使用內存池    StackAlloc<int, MemoryPool<int> > stackPool;    start = clock();    for (int j = 0; j < REPS; j++) {        assert(stackPool.empty());        for (int i = 0; i < ELEMS; i++)          stackPool.push(i);        for (int i = 0; i < ELEMS; i++)          stackPool.pop();    }    std::cout << "MemoryPool Allocator Time: ";    std::cout << (((double)clock() - start) / CLOCKS_PER_SEC) << "

";
    return 0;}

在上面的兩段代碼中，StackAlloc 是一個鏈表棧，接受兩個模板參數，第一個參數是棧中的元素類型，第二個參數就是棧使用的內存分配器。

因此，這個內存分配器的模板參數就是整個比較過程中唯一的變量，使用默認分配器的模板參數為 std::allocator，而使用內存池的模板參數為 MemoryPool。

std::allocator 是 C++標準庫中提供的默認分配器，他的特點就在于我們在使用 new 來申請內存構造新對象的時候，勢必要調用類對象的默認構造函數，而使用 std::allocator 則可以將內存分配和對象的構造這兩部分邏輯給分離開來，使得分配的內存是原始、未構造的。

下面我們來實現這個鏈表棧。

三、模板鏈表棧

棧的結構非常的簡單，沒有什么復雜的邏輯操作，其成員函數只需要考慮兩個基本的操作：入棧、出棧。為了操作上的方便，我們可能還需要這樣一些方法：判斷棧是否空、清空棧、獲得棧頂元素。

#include template <typename T>struct StackNode_{  T data;  StackNode_* prev;};// T 為存儲的對象類型, Alloc 為使用的分配器, 并默認使用 std::allocator 作為對象的分配器template <typename T, typename Alloc = std::allocator >class StackAlloc{  public:    // 使用 typedef 簡化類型名    typedef StackNode_ Node;    typedef typename Alloc::template rebind::other allocator;
    // 默認構造    StackAlloc() { head_ = 0; }    // 默認析構    ~StackAlloc() { clear(); }
    // 當棧中元素為空時返回 true    bool empty() {return (head_ == 0);}
    // 釋放棧中元素的所有內存    void clear();
    // 壓棧    void push(T element);
    // 出棧    T pop();
    // 返回棧頂元素    T top() { return (head_->data); }
  private:    //     allocator allocator_;    // 棧頂    Node* head_;};

簡單的邏輯諸如構造、析構、判斷棧是否空、返回棧頂元素的邏輯都非常簡單，直接在上面的定義中實現了，下面我們來實現 clear(), push() 和 pop() 這三個重要的邏輯：

// 釋放棧中元素的所有內存void clear() {  Node* curr = head_;  // 依次出棧  while (curr != 0)  {    Node* tmp = curr->prev;    // 先析構, 再回收內存    allocator_.destroy(curr);    allocator_.deallocate(curr, 1);    curr = tmp;  }  head_ = 0;}// 入棧void push(T element) {  // 為一個節點分配內存  Node* newNode = allocator_.allocate(1);  // 調用節點的構造函數  allocator_.construct(newNode, Node());
  // 入棧操作  newNode->data = element;  newNode->prev = head_;  head_ = newNode;}
// 出棧T pop() {  // 出棧操作 返回出棧元素  T result = head_->data;  Node* tmp = head_->prev;  allocator_.destroy(head_);  allocator_.deallocate(head_, 1);  head_ = tmp;  return result;}

至此，我們完成了整個模板鏈表棧，現在我們可以先注釋掉 main() 函數中使用內存池部分的代碼來測試這個連表棧的內存分配情況，我們就能夠得到這樣的結果：

在使用 std::allocator 的默認內存分配器中，在

#define ELEMS 10000000
#define REPS 100

的條件下，總共花費了近一分鐘的時間。

如果覺得花費的時間較長，不愿等待，則你嘗試可以減小這兩個值

總結

本節我們實現了一個用于測試性能比較的模板鏈表棧，目前的代碼如下。在下一節中，我們開始詳細實現我們的高性能內存池。

// StackAlloc.hpp
#ifndef STACK_ALLOC_H#define STACK_ALLOC_H
#include 
template <typename T>struct StackNode_{  T data;  StackNode_* prev;};
// T 為存儲的對象類型, Alloc 為使用的分配器,// 并默認使用 std::allocator 作為對象的分配器template <class T, class Alloc = std::allocator >class StackAlloc{  public:    // 使用 typedef 簡化類型名    typedef StackNode_ Node;    typedef typename Alloc::template rebind::other allocator;
    // 默認構造    StackAlloc() { head_ = 0; }    // 默認析構    ~StackAlloc() { clear(); }
    // 當棧中元素為空時返回 true    bool empty() {return (head_ == 0);}
    // 釋放棧中元素的所有內存    void clear() {      Node* curr = head_;      while (curr != 0)      {        Node* tmp = curr->prev;        allocator_.destroy(curr);        allocator_.deallocate(curr, 1);        curr = tmp;      }      head_ = 0;    }
    // 入棧    void push(T element) {      // 為一個節點分配內存      Node* newNode = allocator_.allocate(1);      // 調用節點的構造函數      allocator_.construct(newNode, Node());
      // 入棧操作      newNode->data = element;      newNode->prev = head_;      head_ = newNode;    }
    // 出棧    T pop() {      // 出棧操作 返回出棧結果      T result = head_->data;      Node* tmp = head_->prev;      allocator_.destroy(head_);      allocator_.deallocate(head_, 1);      head_ = tmp;      return result;    }
    // 返回棧頂元素    T top() { return (head_->data); }
  private:    allocator allocator_;    Node* head_;};
#endif // STACK_ALLOC_H

// main.cpp
#include #include #include #include 
// #include "MemoryPool.hpp"#include "StackAlloc.hpp"
// 根據電腦性能調整這些值// 插入元素個數#define ELEMS 25000000// 重復次數#define REPS 50
int main(){    clock_t start;
    // 使用默認分配器    StackAlloc<int, std::allocator<int> > stackDefault;    start = clock();    for (int j = 0; j < REPS; j++) {        assert(stackDefault.empty());        for (int i = 0; i < ELEMS; i++)          stackDefault.push(i);        for (int i = 0; i < ELEMS; i++)          stackDefault.pop();    }    std::cout << "Default Allocator Time: ";    std::cout << (((double)clock() - start) / CLOCKS_PER_SEC) << "

";
    // 使用內存池    // StackAlloc > stackPool;    // start = clock();    // for (int j = 0; j < REPS; j++) {    //     assert(stackPool.empty());    //     for (int i = 0; i < ELEMS; i++)    //       stackPool.push(i);    //     for (int i = 0; i < ELEMS; i++)    //       stackPool.pop();    // }    // std::cout << "MemoryPool Allocator Time: ";    // std::cout << (((double)clock() - start) / CLOCKS_PER_SEC) << "

";
    return 0;}

二、設計內存池

在上一節實驗中，我們在模板鏈表棧中使用了默認構造器來管理棧操作中的元素內存，一共涉及到了 rebind::other, allocate(), dealocate(), construct(), destroy()這些關鍵性的接口。所以為了讓代碼直接可用，我們同樣應該在內存池中設計同樣的接口：

#ifndef MEMORY_POOL_HPP#define MEMORY_POOL_HPP
#include #include 
template <typename T, size_t BlockSize = 4096>class MemoryPool{  public:    // 使用 typedef 簡化類型書寫    typedef T*              pointer;
    // 定義 rebind::other 接口    template <typename U> struct rebind {      typedef MemoryPool other;    };
    // 默認構造, 初始化所有的槽指針    // C++11 使用了 noexcept 來顯式的聲明此函數不會拋出異常    MemoryPool() noexcept {      currentBlock_ = nullptr;      currentSlot_ = nullptr;      lastSlot_ = nullptr;      freeSlots_ = nullptr;    }
    // 銷毀一個現有的內存池    ~MemoryPool() noexcept;
    // 同一時間只能分配一個對象, n 和 hint 會被忽略    pointer allocate(size_t n = 1, const T* hint = 0);
    // 銷毀指針 p 指向的內存區塊    void deallocate(pointer p, size_t n = 1);
    // 調用構造函數    template <typename U, typename... Args>    void construct(U* p, Args&&... args);
    // 銷毀內存池中的對象, 即調用對象的析構函數    template <typename U>    void destroy(U* p) {      p->~U();    }
  private:    // 用于存儲內存池中的對象槽,     // 要么被實例化為一個存放對象的槽,     // 要么被實例化為一個指向存放對象槽的槽指針    union Slot_ {      T element;      Slot_* next;    };
    // 數據指針    typedef char* data_pointer_;    // 對象槽    typedef Slot_ slot_type_;    // 對象槽指針    typedef Slot_* slot_pointer_;
    // 指向當前內存區塊    slot_pointer_ currentBlock_;    // 指向當前內存區塊的一個對象槽    slot_pointer_ currentSlot_;    // 指向當前內存區塊的最后一個對象槽    slot_pointer_ lastSlot_;    // 指向當前內存區塊中的空閑對象槽    slot_pointer_ freeSlots_;
    // 檢查定義的內存池大小是否過小    static_assert(BlockSize >= 2 * sizeof(slot_type_), "BlockSize too small.");};
#endif // MEMORY_POOL_HPP

在上面的類設計中可以看到，在這個內存池中，其實是使用鏈表來管理整個內存池的內存區塊的。內存池首先會定義固定大小的基本內存區塊(Block)，然后在其中定義了一個可以實例化為存放對象內存槽的對象槽（Slot_）和對象槽指針的一個聯合。然后在區塊中，定義了四個關鍵性質的指針，它們的作用分別是：

currentBlock_: 指向當前內存區塊的指針

currentSlot_: 指向當前內存區塊中的對象槽

lastSlot_: 指向當前內存區塊中的最后一個對象槽

freeSlots_: 指向當前內存區塊中所有空閑的對象槽

梳理好整個內存池的設計結構之后，我們就可以開始實現關鍵性的邏輯了。

三、實現

MemoryPool::construct() 實現

MemoryPool::construct() 的邏輯是最簡單的，我們需要實現的，僅僅只是調用信件對象的構造函數即可，因此：

// 調用構造函數, 使用 std::forward 轉發變參模板template <typename U, typename... Args>void construct(U* p, Args&&... args) {    new (p) U (std::forward(args)...);}

MemoryPool::deallocate() 實現

MemoryPool::deallocate() 是在對象槽中的對象被析構后才會被調用的，主要目的是銷毀內存槽。其邏輯也不復雜：

// 銷毀指針 p 指向的內存區塊void deallocate(pointer p, size_t n = 1) {  if (p != nullptr) {    // reinterpret_cast 是強制類型轉換符    // 要訪問 next 必須強制將 p 轉成 slot_pointer_    reinterpret_cast(p)->next = freeSlots_;    freeSlots_ = reinterpret_cast(p);  }}

MemoryPool::~MemoryPool() 實現

析構函數負責銷毀整個內存池，因此我們需要逐個刪除掉最初向操作系統申請的內存塊：

// 銷毀一個現有的內存池~MemoryPool() noexcept {  // 循環銷毀內存池中分配的內存區塊  slot_pointer_ curr = currentBlock_;  while (curr != nullptr) {    slot_pointer_ prev = curr->next;    operator delete(reinterpret_cast<void*>(curr));    curr = prev;  }}

MemoryPool::allocate() 實現

MemoryPool::allocate() 毫無疑問是整個內存池的關鍵所在，但實際上理清了整個內存池的設計之后，其實現并不復雜。具體實現如下：

// 同一時間只能分配一個對象, n 和 hint 會被忽略pointer allocate(size_t n = 1, const T* hint = 0) {  // 如果有空閑的對象槽，那么直接將空閑區域交付出去  if (freeSlots_ != nullptr) {    pointer result = reinterpret_cast(freeSlots_);    freeSlots_ = freeSlots_->next;    return result;  } else {    // 如果對象槽不夠用了，則分配一個新的內存區塊    if (currentSlot_ >= lastSlot_) {      // 分配一個新的內存區塊，并指向前一個內存區塊      data_pointer_ newBlock = reinterpret_cast(operator new(BlockSize));      reinterpret_cast(newBlock)->next = currentBlock_;      currentBlock_ = reinterpret_cast(newBlock);      // 填補整個區塊來滿足元素內存區域的對齊要求      data_pointer_ body = newBlock + sizeof(slot_pointer_);      uintptr_t result = reinterpret_cast<uintptr_t>(body);      size_t bodyPadding = (alignof(slot_type_) - result) % alignof(slot_type_);      currentSlot_ = reinterpret_cast(body + bodyPadding);      lastSlot_ = reinterpret_cast(newBlock + BlockSize - sizeof(slot_type_) + 1);    }    return reinterpret_cast(currentSlot_++);  }}

四、與 std::vector 的性能對比

我們知道，對于棧來說，鏈棧其實并不是最好的實現方式，因為這種結構的棧不可避免的會涉及到指針相關的操作，同時，還會消耗一定量的空間來存放節點之間的指針。事實上，我們可以使用 std::vector 中的 push_back() 和 pop_back() 這兩個操作來模擬一個棧，我們不妨來對比一下這個 std::vector 與我們所實現的內存池在性能上誰高誰低，我們在主函數中加入如下代碼：

// 比較內存池和 std::vector 之間的性能    std::vector<int> stackVector;    start = clock();    for (int j = 0; j < REPS; j++) {        assert(stackVector.empty());        for (int i = 0; i < ELEMS; i++)          stackVector.push_back(i);        for (int i = 0; i < ELEMS; i++)          stackVector.pop_back();    }    std::cout << "Vector Time: ";    std::cout << (((double)clock() - start) / CLOCKS_PER_SEC) << "

";

這時候，我們重新編譯代碼，就能夠看出這里面的差距了：

首先是使用默認分配器的鏈表棧速度最慢，其次是使用 std::vector 模擬的棧結構，在鏈表棧的基礎上大幅度削減了時間。

std::vector 的實現方式其實和內存池較為類似，在 std::vector 空間不夠用時，會拋棄現在的內存區域重新申請一塊更大的區域，并將現在內存區域中的數據整體拷貝一份到新區域中。

最后，對于我們實現的內存池，消耗的時間最少，即內存分配性能最佳，完成了本項目。

總結

本節中，我們實現了我們上節實驗中未實現的內存池，完成了整個項目的目標。這個內存池不僅精簡而且高效，整個內存池的完整代碼如下：

#ifndef MEMORY_POOL_HPP#define MEMORY_POOL_HPP
#include #include 
template <typename T, size_t BlockSize = 4096>class MemoryPool{  public:    // 使用 typedef 簡化類型書寫    typedef T*              pointer;
    // 定義 rebind::other 接口    template <typename U> struct rebind {      typedef MemoryPool other;    };
    // 默認構造    // C++11 使用了 noexcept 來顯式的聲明此函數不會拋出異常    MemoryPool() noexcept {      currentBlock_ = nullptr;      currentSlot_ = nullptr;      lastSlot_ = nullptr;      freeSlots_ = nullptr;    }
    // 銷毀一個現有的內存池    ~MemoryPool() noexcept {      // 循環銷毀內存池中分配的內存區塊      slot_pointer_ curr = currentBlock_;      while (curr != nullptr) {        slot_pointer_ prev = curr->next;        operator delete(reinterpret_cast<void*>(curr));        curr = prev;      }    }
    // 同一時間只能分配一個對象, n 和 hint 會被忽略    pointer allocate(size_t n = 1, const T* hint = 0) {      if (freeSlots_ != nullptr) {        pointer result = reinterpret_cast(freeSlots_);        freeSlots_ = freeSlots_->next;        return result;      }      else {        if (currentSlot_ >= lastSlot_) {          // 分配一個內存區塊          data_pointer_ newBlock = reinterpret_cast(operator new(BlockSize));          reinterpret_cast(newBlock)->next = currentBlock_;          currentBlock_ = reinterpret_cast(newBlock);          data_pointer_ body = newBlock + sizeof(slot_pointer_);          uintptr_t result = reinterpret_cast<uintptr_t>(body);          size_t bodyPadding = (alignof(slot_type_) - result) % alignof(slot_type_);          currentSlot_ = reinterpret_cast(body + bodyPadding);          lastSlot_ = reinterpret_cast(newBlock + BlockSize - sizeof(slot_type_) + 1);        }        return reinterpret_cast(currentSlot_++);      }    }
    // 銷毀指針 p 指向的內存區塊    void deallocate(pointer p, size_t n = 1) {      if (p != nullptr) {        reinterpret_cast(p)->next = freeSlots_;        freeSlots_ = reinterpret_cast(p);      }    }
    // 調用構造函數, 使用 std::forward 轉發變參模板    template <typename U, typename... Args>    void construct(U* p, Args&&... args) {      new (p) U (std::forward(args)...);    }
    // 銷毀內存池中的對象, 即調用對象的析構函數    template <typename U>    void destroy(U* p) {      p->~U();    }
  private:    // 用于存儲內存池中的對象槽    union Slot_ {      T element;      Slot_* next;    };
    // 數據指針    typedef char* data_pointer_;    // 對象槽    typedef Slot_ slot_type_;    // 對象槽指針    typedef Slot_* slot_pointer_;
    // 指向當前內存區塊    slot_pointer_ currentBlock_;    // 指向當前內存區塊的一個對象槽    slot_pointer_ currentSlot_;    // 指向當前內存區塊的最后一個對象槽    slot_pointer_ lastSlot_;    // 指向當前內存區塊中的空閑對象槽    slot_pointer_ freeSlots_;    // 檢查定義的內存池大小是否過小    static_assert(BlockSize >= 2 * sizeof(slot_type_), "BlockSize too small.");};
#endif // MEMORY_POOL_HPP

實驗來源：shiyanlou.com/courses/r

項目來源：github.com/cacay/Memory

　　審核編輯：湯梓紅

閱讀全文

操作系統(128935) 操作系統(128935)
C++(76833) C++(76833)
內存管理(14830) 內存管理(14830)

搜索歷史

C++內存池的設計與實現

評論