Stream processing and reactive programming

October 30, 2014 — July 1, 2015

computers are awful
concurrency hell
premature optimization
signal processing
stringology
Figure 1

Lazy bookmark for practical details on processing and transforming possibly infinite streams of data, from signals to parse trees. Disambiguating “transducers”.

Used in parallel/offline processing of large data sets that do not fit in core, or processing things that happen in real-time such as UI.

I am imagining more general objects than singly-indexed real-valued signals; Tokens, maybe. Classic DSP can be elsewhere. Infrastructure to do stream processing in a distributed fashion is filed under message queues.

In statistics and machine learning, stream processing connects with online learning; incorporating data as it comes in, as in distributed statistics.

1 Functional reactive programming

See FRP.

2 Streaming data analysis

Online, possibly real-time, certainly memory-constrained.

2.1 Qminer

?

qminer

UNSTRUCTURED DATA
QMiner provides support for unstructured data, such as text and social networks across the entire processing pipeline, from feature engineering and indexing to aggregation and machine learning.
SEARCH
QMiner provides out-of-the-box support for indexing, querying and aggregating structured, unstructured and geospatial data using a simple query language.
JAVASCRIPT API
QMiner applications are implemented in JavaScript, making it easy to get started. Using the Javascript API it is easy to compose complete data processing pipelines and integrate with other systems via RESTful web services.
C++ LIBRARY
QMiner is implemented in C++ and can be included as a library into custom C++ projects, thus providing them with stream processing and data analytics capabilities.

3 To read

4 References

Hu, Pehlevan, and Chklovskii. 2014. A Hebbian/Anti-Hebbian Network for Online Sparse Dictionary Learning Derived from Symmetric Matrix Factorization.” In 2014 48th Asilomar Conference on Signals, Systems and Computers.
McSherry, Isaacs, Isard, et al. 2013. Differential dataflow. US20130304744 A1.
Murray, McSherry, Isaacs, et al. 2013. Naiad: A Timely Dataflow System.” In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. SOSP ’13.
Pan, Zhang, Wu, et al. 2014. Online Community Detection for Large Complex Networks.” PLoS ONE.
Ryabko, and Ryabko. 2010. Nonparametric Statistical Inference for Ergodic Processes.” IEEE Transactions on Information Theory.
Sorensen, and Gardner. 2010. Programming with Time: Cyber-Physical Programming with Impromptu.” In ACM Sigplan Notices.