Algorithms and data structures for efficient representation and processing of large databases can be combined with advances in computer architecture, as hardware-aware implementations that exploit particular hardware features. For example, many algorithms have been adapted to exploit the architecture of GPUs, FPGAs, or general-purpose CPUs providing instructions included for particular application domains.
In this paper we explore how the Intel SSE4.2 (Streaming SIMD Extensions) SIMD (Single Instruction Multiple Data) instructions included in Intel/AMD processors can improve the performance of algorithms for text indexing and searching. The SSE4.2 instruction subset provides instructions for text processing. Our implementations are mainly based on the followings: POPCOUNT counts the number of 1 bits in a word of up to 64 bits, PCMPESTRI compares two strings of length up to 16 bytes and returns the result as a binary mask.
Despite the benefits these features can bring to text processing, they have been rarely used or evaluated in the existing literature. We present case studies and experimental results that show how much text/string algorithms can benefit from the SIMD extensions. Particularly, we focus on the rank and select operations in sequences of bits and bytes, and the Horspool string search algorithm.