Redshift is a massively parallel processing (MPP) columnar database system that was recently added to Amazon’s stack of cloud services. It is often brought up in the context of “Big Data” (think: scanning through Petabytes of twitter feeds or web logs), but has proven invaluable in enhancing data processing at GreatVines. Columnar MPP database systems enable rapid loading, querying, and aggregating of large volumes of data. This is in contrast to more traditional transactional database platforms (like Postgres or MySQL) that offer fast query interaction on a record-by-record level, but often suffer considerably under the pressure of analytics demands that require a lot of aggregation at multiple levels.
At GreatVines, one of the subject areas we deal with that has the largest data volumes is RAD, or STR (Retail Account Depletions, a.k.a. Sales to Retail). RAD data represents the sales of a product that a distributor makes to each of the retail outlets that it supplies. There are over 600,000 retail outlets in the US that sell beverage alcohol and every one of them can receive multiple shipments of a product every month. RAD data captures a record for every retailer, every distributor, every product, and every day that a sale is made. Naturally, the amount of data that must be captured and managed adds up very quickly. To further complicate matters, historical RAD data is not constant! Distributors routinely make adjustments to prior month’s sales data and all of these restatements must be captured regularly and accurately.
GreatVines has overcome the challenge in handling volatility of large volumes of RAD data by building a change data capture (CDC) system on top of Redshift cluster. Our CDC system allows us to rapidly load and track all historical changes to RAD data, and Amazon’s Redshift technology enables us to do so efficiently. When distributors restate historical RAD data, it can quickly be loaded into GreatVines while maintaining an accurate and detailed record of the changes made. Prior to this system being in place, it could take almost a full day to transform and load some of our sales data. Now, the same work can be accomplished in about an hour. We have plans to roll out this technology to other subject areas in GreatVines to further enhance the speed and variety of data available for analytical reporting.