PARALLELIZING TIME-SERIES SESSION DATA ANALYSIS WITH A TYPE-ERASURE BASED DSEL

  • Ruo Ando National Institute of Informatics

Abstract

The Science Information Network (SINET) is a Japanese academic backbone network.  SINET consists of more than 800 universities and research institutions.  In the operation of a huge academic backbone network, more flexible querying technology is required to cope with massive time series session data and analysis of sophisticated cyber-attacks. This paper proposes a parallelizing DSEL (Domain Specific Embedded Language) processing for huge time-series session data. In our DESL, the function object is implemented by type erasure for constructing internal DSL for processing time-series data. Type erasure enables our parser to store function pointer and function object into the same *void type with class templates. We apply to scatter/gather pattern for concurrent DSEL parsing. Each thread parses DSEL to extract the tuple timestamp, source IP, and destination IP in the gather phase. In the scattering phase, we use a concurrent hash map to handle multiple thread outputs with our DSEL.

In the experiment, we have measured the elapsed time in parsing and inserting IPv4 address and timestamp data format ranging from 1,000 to 50,000 lines with 24-row items. We have also measured CPU idle time in processing 100,000,000 lines of session data with 5, 10 and 20 multiple threads. It has been turned out that the proposed method can work in feasible computing time in both cases.

Downloads

Download data is not yet available.

References

. I. Brcic: "Ideally Fast" Decimal Counters with Bi-stables. IEEE Trans. Electron. Comput. 14(5): 733-737 (1965)

. Bryan Ford: Parsing Expression Grammars: A recognition-based Syntactic Foundation, http://pdos.csail.mit.edu/~baford/packrat/popl04/

. Richard E. Pattis: EBNF: A Notation to Describe Syntax, http://www.cs.cmu.edu/~pattis/misc/ebnf.pdf

. K. Czarnecki, J. T. O’Donnell, J. Striegnitz, and W. Taha. DSL implementation in MetaOCaml, Template Haskell, and C++. In Domain-Specific Program Generation, volume 3016 of LNCS, pages 51-72. Springer, 2003.

. W. Taha. A gentle introduction to multi-stage pro-gramming. In Domain-Specific Program Generation, Springer LNCS 3016, pages 30-50, 2003.

. S. Seefried, M. M. T. Chakravarty, and G. Keller. Optimising Embedded DSLs using Template Haskell. G. Karsai and E. Visser, editors, GPCE, volume 3286 of Lecture Notes in Computer Science, pages 186-205. Springer, 2004.

. T. Sheard and S. Peyton Jones. Template metapro-gramming for Haskell. In M. M. T. Chakravarty, edi-tor, ACM SIGPLAN Haskell Workshop 02, pages 1-16. ACM Press, Oct. 2002.

. C. Elliott. Functional images. In the Fun of Pro-gramming, “Cornerstones of Computing” series. Palgrave, Mar. 2003.

. D. MacQueen, “Modules for Standard ML”, in Pro-ceedings of the 1984 ACM Symposium on Lisp and Functional Programming, Papers Presented at the Symposium, August 1984, pages 198-207, New York, August 1984. Association for Computing Ma-chinery.

. R. Harper and M. Lillibridge, “A Type-Theoretic Approach to Higher-Order Modules with Sharing”, In Proceedings of 21st ACM Symposium on Princi-ples of Programming Languages, January 1994.

. X. Leroy. Manifest Types, Modules and Separate Compilation. In Proceedings of 21st ACM Symposi-um on Principles of Programming Languages, pages 109-122, January 1994.

. K. Fisher and J. H. Reppy. “The Design of a Class Mechanism for Moby”, In proc of SIGPLAN Con-ference on Programming Language Design and Im-plementation, pages 37-49, 1999.

. Lee, K., LaMarca, A., Chambers, C.: HydroJ: Object-oriented Pattern Matching for Evolvable Distributed Systems. In: Proc. of Object-Oriented Programming Systems and Languages (OOPSLA). (2003)

. Gapeyev, V., Pierce, B.C.: Regular Object Types. In: Proc. of European Conference on Object-oriented Programming (ECOOP). (2003)

. Chin, B., Millstein, T.: Responders, “Language Sup-port for Interactive Applications”, In Proc of Euro-pean Conference on Object-Oriented Programming (ECOOP). (2006)

. Moreau, P.E., Ringeissen, C., Vittek, M.: A Pattern Matching Compiler for Multiple Target Languages. In: In Proc. of Compiler Construction (CC), volume 2622 of LNCS. (2003) 61-76

. Liu, J., Myers, A.C.”, JMatch: Iterable Abstract Pat-tern Matching for Java”, In proceedings of the 5th International Symposium on Practical Aspects of Declarative Languages (PADL). (2003) 110-127

. Clifton, C., Millstein, T., Leavens, G.T., Chambers, C., “Multi Java: Design Rationale, Compiler Imple-mentation, and Applications” in Prof of ACM Transactions on Programming Languages and Sys-tems 28(3) (May 2006) 517-575.

. Ernst, M., Kaplan, C., Chambers, C.: Predicate dis-patching: a unified theory of dispatch. In Proceed-ings of Euro-pean Conference on Object-Oriented Programming (ECOOP). Volume 1445 of Springer LNCS. (1998) 186-211

. Millstein, T.: Practical Predicate Dispatch. In: Proc. of Object-Oriented Programming Systems and Lan-guages (OOPSLA). (2004)’

. Wadler, P., “Views: A way for pattern matching to cohabit with data abstraction”, In Proceedings of Principles of Programming Languages (POPL). (1987)

. Okasaki, C.: Views for Standard ML. In: In SIG-PLAN Workshop on ML, pages 14-23. (1998)

. Alfred V. Aho and Jeffrey D. Ullman. The Theory of Parsing, Translation and Compiling - Vol. I: Parsing. Prentice-Hall, Englewood Cliffs, N.J., 1972.

. Alexander Birman. The TMG Recognition Schema. PhD thesis, Princeton University, February 1970.

. Alexander Birman and Jeffrey D. Ullman. Parsing algorithms with backtrack. Information and Control, 23(1):1-34, August 1973.

. Stephen Robert Adams. Modular Grammars for Pro-gramming Language Prototyping. PhD thesis, Uni-versity of Southampton, 1991.

. Terence J. Parr and Russell W. Quong. ANTLR: A Predicated LL(k) parser generator. Software Prac-tice and Experience, 25(7):789-810, 1995

. Daan Leijen. Parsec, a fast combinator parser. http://www.cs.uu.nl/?daan.

. Balachander Krishnamurthy, Subhabrata Sen, Yin Zhang, Yan Chen: Sketch-based change detection: methods, evaluation, and applications. Internet Measurement Conference 2003: 234-247

. Ady Wahyudi Paundu, Takeshi Okuda, Youki Kadobayashi, Suguru Yamaguchi: Sequence-Based Analysis of Static Probe Instrumentation Data for a VMM-Based Anomaly Detection System. CSCloud 2016: 84-94

. Kathleen Fisher, Robert Gruber: PADS: a domain-specific language for processing ad hoc data. PLDI, 2005: 295-304.

. Kathleen Fisher, David Walker, Kenny Qili Zhu, Pe-ter White: From dirt to shovels: fully automatic tool generation from ad hoc data. POPL 2008: 421-434

. Kevin Borders, Jonathan Springer, Matthew Burn-side: Chimera: A Declarative Language for Stream-ing Network Traffic Analysis. USENIX Security Symposium 2012: 365-379

. Peng Gao, Xusheng Xiao, Ding Li, Zhichun Li, Kangkook Jee, Zhenyu Wu, Chung Hwan Kim, San-jeev R. Kulkarni, Prateek Mittal: SAQL: A Stream-based Query System for Real-Time Abnormal Sys-tem Behavior Detection. USENIX Security Sympo-sium, 2018: 639-656.

. Vern Paxson: Bro: a system for detecting network intruders in real-time. Comput. Networks 31(23-24): 2435-2463 (1999)

. Martin Roesch: Snort: Lightweight Intrusion Detec-tion for Networks. LISA 1999: 229-238

. Jonathan DiLorenzo, Katie Mancini, Kathleen Fish-er, Nate Foster: TxForest: A DSL for Concurrent File stores. APLAS 2019: 332-354

. Charisee Chiw, Gordon L. Kindlmann, John H. Rep-py, Lamont Samuels, Nick Seltzer: Diderot: a paral-lel DSL for image analysis and visualization. PLDI 2012: 111-120

. Todd A. Anderson, Hai Liu, Lindsey Kuper, Ehsan Totoni, Jan Vitek, Tatiana Shpeisman: Parallelizing Julia with a Non-Invasive DSL. ECOOP 2017: 4:1-4:29

. Waltz, D.; Pollack, J. (1985). "Massively parallel parsing: A strongly interactive model of natural lan-guage interpretation". Cognitive Science. 9: 51–74.

Published
2020-12-25
How to Cite
Ando, R. (2020). PARALLELIZING TIME-SERIES SESSION DATA ANALYSIS WITH A TYPE-ERASURE BASED DSEL. International Journal of Advanced Computer Technology, 9(6), 17-25. Retrieved from https://ijact.org/index.php/ijact/article/view/69
Section
Articles