Today, NEUCHIPS Corp., an AI compute company specializing in domain-specific accelerator solutions, announced the world’s first recommendation engine – RecAccelTM – that can perform 500,000 inferences per second. Running open-source PyTorch DLRM, RecAccelTM outperforms server-class CPU and inference GPU by 28X and 65X, respectively. It is equipped with an ultra-high-capacity, high-bandwidth memory subsystem for embedding table lookup and a massively parallel compute FPGA for neural network inference. Via a PCIe Gen3 host interface, RecAccelTM is ready for data center adaptation.
RecAccelTM boosts DLRM inference performance through the following innovations:
- Embedding-specific memory architecture, allocation and access scheme.
- Application-specific processing pipeline.
- Scalable Multiply-And-Accumulator (MAC) array.
Most e-commerce, on-line advertisement, and internet service providers employ recommendation systems to expand interests or delivery services, such as search result ranking, friend suggestions, movie recommendations and purchase suggestions. Recommendations usually account for most of the AI inference workload in data centers.
“Fast and accurate recommendation inference is the key to e-commerce business success,” said Dr. Youn-Long Lin, CEO of NEUCHIPS. “RecAccelTM powers your business with the lowest latency, highest throughput, and best TCO.”