Our optimal filter implementation utilized the Direct Form II Transposed filter structure with 8 POSIX threads and innovative data structuring with the SSE3 instruction set and the O3 compiler optimization flag to provide the perfect combination between cache hits, CPU utilization, and minimal cache poisoning. Assuming a sampling rate of 25 KHz, we were able to achieve our goal of real-time filtering by processing one second of simulated data in 14 ms (this figure was calculated using the real-time processing speed formula specified in the Results section).
Note that we obtained our results on a single type of computer architecture. We expect that many of our optimization methods, such as SSE and data reordering, will be effective on any architecture. The cache organization of most modern processors are relatively similar to our benchmark computer. Modern CPUs are primarily differentiated in the number of physical cores they contain. This will alter the number of p-threads required for optimal CPU utilization. Machines with more cores will efficiently utilize more p-threads, while machines with fewer cores will utilizes fewer p-threads.
Processing data at this speed is of paramount importance to the Open Ephys project and to neural signal processing in general. The filter bank we have written is designed to provide a generic bandpass filter to all incoming signals. Signals will typically require additional real-time processing afterwards. For example, a good number of neural signal projects need to detect specific phenomena (usually peaks) in real-time and provide impulses to alter the source neuron's behavior.




