Manages device memory by preallocating a large amount of device memory and handling out chunks to requesters. More...
#include <MNCudaMemPool.h>
Classes | |
| class | AssignedSegment |
| Keeps track of assigned device memory segments. | |
| class | MemoryChunk |
| Keeps track of allocated device or pinned host memory. | |
Public Member Functions | |
| cudaError_t | Initialize (size_t sizeInitial_bytes, size_t sizePinnedHost_bytes) |
| Initializes the memory pool. | |
| cudaError_t | Request (void **d_buffer, size_t size_bytes, const std::string &strCategory="General", size_t alignment=64) |
| Requests a device buffer of a given size. | |
| cudaError_t | RequestTexture (void **d_buffer, size_t size_bytes, const std::string &strCategory="General") |
| Requests a buffer of a given size to use as linear memory to map to textures. | |
| cudaError_t | Release (void *d_buffer) |
| Releases the given device buffer. | |
| cudaError_t | RequestHost (void **h_buffer, size_t size_bytes) |
| Requests pinned host memory of given size. | |
| cudaError_t | ReleaseHost (void *h_buffer) |
| Releases the given assigned buffer of pinned host memory. | |
| void | PrintState (FILE *stream=stdout) const |
| Prints the state of the memory pool to the given file. | |
| void | UpdatePool () |
| Updates the pool by removing any chunks that are unused for a long time. | |
| void | TestPool (FILE *stream=stdout) const |
| Tests the memory pool by checking all memory chunks managed. | |
| size_t | GetTextureAlignment () const |
| Gets the texture alignment for the current device. | |
| size_t | GetDeviceChunkCount () const |
| Gets the device chunk count. | |
| size_t | GetAllocatedSize () const |
| Gets the size of the allocated device memory in bytes. | |
| size_t | GetAssignedSegmentCount () const |
| Gets the assigned device memory segment count. | |
| size_t | GetAssignedSize () const |
| Gets the assigned device memory size in bytes. | |
Static Public Member Functions | |
| static MNCudaMemPool & | GetInstance () |
| Returns the only memory pool instance. | |
Manages device memory by preallocating a large amount of device memory and handling out chunks to requesters.
This avoids multiple calls to cudaMalloc and therefore reduces CUDA API overhead. Was suggested by [Wang et al. 2009] and [Zhou et al. 2008].
Class is designed as singleton and might need optimizations for when used from multiple CPU-threads.
| size_t MNCudaMemPool::GetAllocatedSize | ( | ) | const |
Gets the size of the allocated device memory in bytes.
| size_t MNCudaMemPool::GetAssignedSegmentCount | ( | ) | const |
Gets the assigned device memory segment count.
This is the number of assigned segments within the device memory chunks. Each Request() or RequestTexture() creates a new assigned segment.
| size_t MNCudaMemPool::GetAssignedSize | ( | ) | const |
Gets the assigned device memory size in bytes.
| size_t MNCudaMemPool::GetDeviceChunkCount | ( | ) | const [inline] |
Gets the device chunk count.
This is the number of device chunks currently managed by this pool.
| MNCudaMemPool & MNCudaMemPool::GetInstance | ( | ) | [static] |
Returns the only memory pool instance.
| size_t MNCudaMemPool::GetTextureAlignment | ( | ) | const [inline] |
Gets the texture alignment for the current device.
Linear device memory that is mapped to texture memory has to be aligned using this alignment. Else offsets have to be used when binding the texture using the CUDA API.
| cudaError_t MNCudaMemPool::Initialize | ( | size_t | sizeInitial_bytes, |
| size_t | sizePinnedHost_bytes | ||
| ) |
Initializes the memory pool.
| sizeInitial_bytes | The size of the initial device chunk in bytes. |
| sizePinnedHost_bytes | The size of the only pinned host chunk in bytes. |
cudaSuccess, else some error value. | void MNCudaMemPool::PrintState | ( | FILE * | stream = stdout ) |
const |
Prints the state of the memory pool to the given file.
| [in] | stream | The file stream to print to. |
| cudaError_t MNCudaMemPool::Release | ( | void * | d_buffer ) |
Releases the given device buffer.
This will only work if the buffer has been allocated in this pool. After this call the buffer is no more valid.
| [in] | d_buffer | The buffer to release. |
cudaSuccess, else some error value. | cudaError_t MNCudaMemPool::ReleaseHost | ( | void * | h_buffer ) |
Releases the given assigned buffer of pinned host memory.
This will only work if the buffer has been allocated in this pool. After this call the buffer is no more valid.
| [in] | h_buffer | The pinned host buffer to release. |
cudaSuccess, else some error value. | cudaError_t MNCudaMemPool::Request | ( | void ** | d_buffer, |
| size_t | size_bytes, | ||
| const std::string & | strCategory = "General", |
||
| size_t | alignment = 64 |
||
| ) |
Requests a device buffer of a given size.
You can specify an alignment for the memory segment. This allows using the segment for coalesced access or for linear memory to texture mappings. For example, coalesced access to 64 bit words on 1.1 computing capability devices require an alignment of 128 bytes (16 * 8).
| [out] | d_buffer | The allocated device memory buffer. |
| size_bytes | The requested size in bytes. | |
| strCategory | Category of the request. Used for bookkeeping only. | |
| alignment | The alignment in bytes. |
cudaSuccess, else some error value. | cudaError_t MNCudaMemPool::RequestHost | ( | void ** | h_buffer, |
| size_t | size_bytes | ||
| ) |
Requests pinned host memory of given size.
Note that pinned host memory is limited as we currently only provide a fixed chunk of chosen size. This was done due to the fact that lots of pinned host memory can reduce system performance significantly. Check the CUDA SDK programming guide for more information.
| [out] | h_buffer | Pinned host memory of size_bytes bytes. |
| size_bytes | The size in bytes. |
cudaSuccess, else some error value. | cudaError_t MNCudaMemPool::RequestTexture | ( | void ** | d_buffer, |
| size_t | size_bytes, | ||
| const std::string & | strCategory = "General" |
||
| ) |
Requests a buffer of a given size to use as linear memory to map to textures.
It is aligned according to the CUDA device properties to avoid using offsets returned by cudaBindTexture(). This method equals the Request() method with a special alignment parameter.
| [out] | d_buffer | The allocated buffer. |
| size_bytes | The size in bytes. | |
| strCategory | Category the string belongs to. |
cudaSuccess, else some error value. | void MNCudaMemPool::TestPool | ( | FILE * | stream = stdout ) |
const |
Tests the memory pool by checking all memory chunks managed.
Checks memory chunks for errors, e.g. non-disjoint assigned segments.
| [in] | stream | File for result output. |
| void MNCudaMemPool::UpdatePool | ( | ) |
Updates the pool by removing any chunks that are unused for a long time.
Call this on a regular base if you want to avoid stealing to much GPU memory for the lifetime of your application. In most cases, there are peaks of pool usage where big new chunks of device memory are added. After that, these chunks are completely unused. This method tries to eliminate those chunks after some time has passed.
| MNRT Source Code Documentation (Version 1.0) - Copyright © Mathias Neumann 2010 |
Generated on Tue Nov 30 2010 14:28:27 for MNRT by 1.7.2
|