Determining the life cycle of event stream data requires us to first understand our business and how fast it changes. If event data is analyzed, it makes sense that the results of that analysis would feed another process. For example, a customer relationship management (CRM) system or campaign management system like SalesForce.com. Here are some questions I would ask:
- What systems are being fed by this analysis?
- How fast do the results need to be fed to this system? We should probably list those systems, as well as what data elements those systems require. This will help us determine how much data manipulation is needed to complete the task.
- Does human analysis need to take place? If so, it might be necessary to propagate this data to another data store for future analysis.
- How long is this event?
- What data elements of this event will be required for the data warehouse? How long will I retain this information for after-event analysis?
The above relates immediately to the collection and consumption of event data. If we are collecting real-time data – and assessing, aggregating, correlating and analyzing it – consider these questions:
- Where are we storing this data for analysis? This data store would have read and write activity constantly.
- Are we off-loading some of this data at a specific hourly interval for further after-event analysis? If so, what type of data store is required? This would be entirely based on how it's accessed and for what purpose. This data could be off-loaded and used for integration into the data warehouse on an hourly basis.
- What analysis programs or reporting tools are used for consumption of this data? This will determine design specifications.