Data streaming

Course info:

Semester: 7

Elective

ECTS: 6

Hours per week: 2

Professor: T.B.D.

Teaching style: Face to face, Tutorials and project work

Grading: Final written exam (70%), Final project (30%)

Activity Workload
Lectures 26
Tutorials 13
Group work on Laboratory projects 48
Individual study 63
Course total 150

Learning Results

The course will introduce the students to the model of non-persistent data, i.e. data the continuously changes and evolves. In such situation data needs to be processed on a continues (24/7) fashion, often times without several passes over a static images. The course will present data streaming specific models and paradigms for its collection, storagte, analysis and decision-making.
Upon completion of the course the students will:

  • Have a good understanding of the data streaming model and its differences with the static models.
  • Know how to collect requirements about specific application needs related to streaming data (e.g. indexing).
  • Know how to query and retrieve relevant information from data streams.
  • Know how to sample and summarize information from data streams.
  • Be able to implement and interact with a basic data stream processing system.

Skills acquired

  • Search, analysis and synthesis of data and information, using the necessary technologies
  • Learn new data models
  • Individual work
  • Group work
  • Work in an multi-disciplinary environment
  • Work on new case studies (as opposed to ones on static data)
  • Creative and critical thinking
  • Introduction – Basic concepts of Data Streaming
  • Data Strean modeling
  • Data Stream sampling
  • Data Stream database modeling and storage
  • Data Stream querying, Top-k frequent set identification in data streams
  • Sliding-window models
  • Clustering data streams
  • Classifying data streams
  • Association rules in data streams
  • Advanced topics: sketches, histograms, tracking of queries
  • Data streams Management Systems
  1. Minos Garofalakis, Johannes Gehrke, Rajeev Rastogi (editors), Data Stream Management: Processing High-Speed Data Streams, 2016, Springer.

  2. Charu C. Aggarwal, Data Streams: Models and Algorithms, 2014, Springer.

  3. Mitch Seymour, Mastering Kafka Streams and ksqlDB: Building Real-Time Systems by Example, 2021, O’Reilly.

  4. Neha Narkhede, Gewn Shapira, et al., Kafka: The Definitive Guide: Real-Time Data and Stream Processing at Scale.

  5. Fabian Hueske, Vassiliki Kalavri, Stream Processing with A pache Flink: Fundamentals, Implementation, and Operation of Streaming Applications.

Learning Results - Skills acquired

Learning Results

The course will introduce the students to the model of non-persistent data, i.e. data the continuously changes and evolves. In such situation data needs to be processed on a continues (24/7) fashion, often times without several passes over a static images. The course will present data streaming specific models and paradigms for its collection, storagte, analysis and decision-making.
Upon completion of the course the students will:

  • Have a good understanding of the data streaming model and its differences with the static models.
  • Know how to collect requirements about specific application needs related to streaming data (e.g. indexing).
  • Know how to query and retrieve relevant information from data streams.
  • Know how to sample and summarize information from data streams.
  • Be able to implement and interact with a basic data stream processing system.

Skills acquired

  • Search, analysis and synthesis of data and information, using the necessary technologies
  • Learn new data models
  • Individual work
  • Group work
  • Work in an multi-disciplinary environment
  • Work on new case studies (as opposed to ones on static data)
  • Creative and critical thinking
Course content
  • Introduction – Basic concepts of Data Streaming
  • Data Strean modeling
  • Data Stream sampling
  • Data Stream database modeling and storage
  • Data Stream querying, Top-k frequent set identification in data streams
  • Sliding-window models
  • Clustering data streams
  • Classifying data streams
  • Association rules in data streams
  • Advanced topics: sketches, histograms, tracking of queries
  • Data streams Management Systems
Recommended bibliography
  1. Minos Garofalakis, Johannes Gehrke, Rajeev Rastogi (editors), Data Stream Management: Processing High-Speed Data Streams, 2016, Springer.

  2. Charu C. Aggarwal, Data Streams: Models and Algorithms, 2014, Springer.

  3. Mitch Seymour, Mastering Kafka Streams and ksqlDB: Building Real-Time Systems by Example, 2021, O’Reilly.

  4. Neha Narkhede, Gewn Shapira, et al., Kafka: The Definitive Guide: Real-Time Data and Stream Processing at Scale.

  5. Fabian Hueske, Vassiliki Kalavri, Stream Processing with A pache Flink: Fundamentals, Implementation, and Operation of Streaming Applications.