Abstract:
Automating query generation for Complex Event Processing (CEP) enables users to obtain useful insights from data, going beyond what it already knew. Existing automation techniques are both computationally expensive and require extensive domain-specific human interaction. We propose a technique that combines parallel coordinates and shapelets to automate the CEP query generation. Moreover, if the provided dataset is unannotated, we run it through a clustering algorithm to cluster the time instances into different event groups. Then each instance would be represented as a line on a set of parallel coordinates. Then the shapelet-learner algorithm is applied to those lines to extract the relevant shapelets and will be ranked based on their information gain. Next, the shapelets with similar information gain are divided into groups by a shapelet-merger algorithm. The best group for each event is then identified based on the event distribution of the data set and is used to automatically generate queries to detect the complex events. This technique can be applied to both multivariate and multivariate time-series data, and it is computationally and memory efficient. It enables users to focus only on the shapelets with relevant information gains. We demonstrate the utility of this technique using a set of real-world datasets.