Data management and analysis

Course info:

Semester: 5

General Foundation

ECTS: 6

Hours per week: 3

Professor: T.B.D.

Teaching style: Face to face, tutorials and project work

Grading: Final written exam (80%), Individual exercises (20%)

Activity Workload
Lectures 26
Tutorials 13
Group work on Laboratory Projects 48
Individual study 63
Course total 150

Learning Results

The aim of the course is to present large-scale data management techniques and advanced Data Mining issues, as well as their applications.

Upon completion of the courses the students will be able to:

  • Understand issues related to large-scale data repositories (HDMS etc.),
  • Delve into methods and practices related to large-scale Data Mining on the World Wide Web and cloud-based systems,
  • Familiarize themselves with research approaches and new solutions to the problems that arise,
  • Familiarize themselves with applications of theory to real problems, in order to acquire specialized problem-solving skills, which are required in research and/or innovation in order to develop new knowledge and processes as well as to integrate knowledge from different fields,
  • Gain critical awareness of knowledge issues in the field of large-scale data management and its interconnection with other fields,
  • Acquire the necessary skills that will allow them to continue their studies in the field of Large-Scale Data Management and Analytics, in a autonomous fashion, to a large extent.

Skills acquired

  • Search, analysis and synthesis of data and information, using the necessary technologies,
  • Individual work,
  • Work in an interdisciplinary environment,
  • Production of new research ideas,
  • Creative and critical thinking,
  • Large-scale file systems, the Map-Reduce and Spark platforms.
  • Link Analysis,
  • Advertising on the World Wide Web,
  • Data Mining from Social Network Graphs,
  • Recommender Systems,
  • Link-open-data (LOD) platforms,
  • Big data and the Semantic Web,
  • Data Mining and Business Intelligence,
  • J. Leskovec, A. Rajaraman, J.D. Ullman, Mining of Massive Datasets, Cambridge, 2nd edition, 2016.
  • P. Tan, M. Steinbach, V. Kumar, Introdcution to Data Mining, Pearson, 2 edition, 2018.
  • S. Walkowiac, Big Data Analytics with R, Packt Publishing, 2016.
  • P. Cimiano, O. Corcho, V. Presutti, L. Hollink, S. Rudolph (eds.), “The Semantic Web: Semantics and Big Data,” Proceedings of 10th International Conference, ESWC 2013, Montpellier, LNCS 7882, Springer, 2013.
  • H. Chen, R. H. L. Chiang, V. C. Storey, “Business Intelligence and Analytics: From Big Data to Big Impact,” MIS Quarterly, vol. 36, issue 4, pp.1165-1188, December 2012.
  • W. Fan, A. Bifet, “Mining Big Data: Current Status, and Forecast to the Future,” SIGKDD Explorations, vol.14, issue 2, 2014.
  • A. R. Ganguly and A. Gupta, Data Mining Technologies and Decision Support Systems for Business and Scientific Applications, Encyclopedia of Data Warehousing and Mining, 2005.
  • R. Kohavi, N. J. Rothleder, E. Simoudis, “Emerging trends in business analytics,” Communications of the ACM – Evolving data mining into solutions for insights, vol. 45, issue 8, pp 45-48, August 2002.
  • J.P. Shim, M. Warkentin, J.F. Courtney, D.J. Power, R. Sharda, Ch. Carlsson, “Past, Present and Future of Decision Support Technology”, Decision Support Systems: Directions for the Next Decade, vol.33, issue 2, pp. 111-126, June 2002.
  • Y. Sun, J. Han, Mining Heterogeneous Information Networks: Principles and Methodologies, Morgan & Claypool, 2012.
  • H.J. Watson, B. H. Wixom, “The Current State of Business Intelligence,” IEEE Computer, vol. 40, issue 9, pp. 96-99, September 2007.
  • I. Witten, E. Frank, Μ. Hall, Data Mining: Practical Machine Learning Tools and Techniques (4th edition), Morgan Kaufmann, 2019.
  • M. d’Aquin, G. Kronberger, M. Suárez-Figueroa, “Combining Data Mining and Ontology Engineering to enrich Ontologies and Linked Data”, Proceedings of 1st International Workshop on Knowledge Discovery and Data Mining Meets Linked Open Data, Heraklion, 2012.
  • W. Fan, A. Bifet, “Mining Big Data: Current Status, and Forecast to the Future,” SIGKDD Explorations, vol.14, issue 2, 2014.
  • T. Heath, C. Bizer, Linked Data: Evolving the Web into a Global Data Space, Morgan & Claypool, 2011.
  • L. Palathingal, S. Dascalu, F. C. Harris Jr, Y. Varol, “A Brief Survey of Data Curation Literature”, Proceedings of CATA 2015, Honolulu, Hawaii, March 2015,
  • H. Paulheim, “Exploiting Linked Open Data as Background Knowledge in Data Mining”, Proceedings of International Workshop on Data Mining on Linked Data (DMoLD), Prague, 2013.
  • J.P. Shim, M. Warkentin, J.F. Courtney, D.J. Power, R. Sharda, Ch. Carlsson, “Past, Present and Future of Decision Support Technology,” Decision Support Systems: Directions for the Next Decade, vol.33, issue 2, pp. 111-126, June 2002.
  • X. Wu et al., “Top 10 Algorithms in Data Mining,” Knowledge and Information Systems, vol. 14, pp. 1-37, 2008.
  • E. Turban, R. Sharda, D. Delen, D. King, Business Intelligence: A Managerial Approach (2nd Edition), Prentice Hall, 2011.
  • M. North, Data Mining for the Masses with implementations in RapidMiner and R, 2016.
  • H.J. Watson, B. H. Wixom, “The Current State of Business Intelligence,” IEEE Computer, vol. 40, issue 9, , pp 96-99, September 2007.
Learning Results - Skills acquired

Learning Results

The aim of the course is to present large-scale data management techniques and advanced Data Mining issues, as well as their applications.

Upon completion of the courses the students will be able to:

  • Understand issues related to large-scale data repositories (HDMS etc.),
  • Delve into methods and practices related to large-scale Data Mining on the World Wide Web and cloud-based systems,
  • Familiarize themselves with research approaches and new solutions to the problems that arise,
  • Familiarize themselves with applications of theory to real problems, in order to acquire specialized problem-solving skills, which are required in research and/or innovation in order to develop new knowledge and processes as well as to integrate knowledge from different fields,
  • Gain critical awareness of knowledge issues in the field of large-scale data management and its interconnection with other fields,
  • Acquire the necessary skills that will allow them to continue their studies in the field of Large-Scale Data Management and Analytics, in a autonomous fashion, to a large extent.

Skills acquired

  • Search, analysis and synthesis of data and information, using the necessary technologies,
  • Individual work,
  • Work in an interdisciplinary environment,
  • Production of new research ideas,
  • Creative and critical thinking,
Course content
  • Large-scale file systems, the Map-Reduce and Spark platforms.
  • Link Analysis,
  • Advertising on the World Wide Web,
  • Data Mining from Social Network Graphs,
  • Recommender Systems,
  • Link-open-data (LOD) platforms,
  • Big data and the Semantic Web,
  • Data Mining and Business Intelligence,
Recommended bibliography
  • J. Leskovec, A. Rajaraman, J.D. Ullman, Mining of Massive Datasets, Cambridge, 2nd edition, 2016.
  • P. Tan, M. Steinbach, V. Kumar, Introdcution to Data Mining, Pearson, 2 edition, 2018.
  • S. Walkowiac, Big Data Analytics with R, Packt Publishing, 2016.
  • P. Cimiano, O. Corcho, V. Presutti, L. Hollink, S. Rudolph (eds.), “The Semantic Web: Semantics and Big Data,” Proceedings of 10th International Conference, ESWC 2013, Montpellier, LNCS 7882, Springer, 2013.
  • H. Chen, R. H. L. Chiang, V. C. Storey, “Business Intelligence and Analytics: From Big Data to Big Impact,” MIS Quarterly, vol. 36, issue 4, pp.1165-1188, December 2012.
  • W. Fan, A. Bifet, “Mining Big Data: Current Status, and Forecast to the Future,” SIGKDD Explorations, vol.14, issue 2, 2014.
  • A. R. Ganguly and A. Gupta, Data Mining Technologies and Decision Support Systems for Business and Scientific Applications, Encyclopedia of Data Warehousing and Mining, 2005.
  • R. Kohavi, N. J. Rothleder, E. Simoudis, “Emerging trends in business analytics,” Communications of the ACM – Evolving data mining into solutions for insights, vol. 45, issue 8, pp 45-48, August 2002.
  • J.P. Shim, M. Warkentin, J.F. Courtney, D.J. Power, R. Sharda, Ch. Carlsson, “Past, Present and Future of Decision Support Technology”, Decision Support Systems: Directions for the Next Decade, vol.33, issue 2, pp. 111-126, June 2002.
  • Y. Sun, J. Han, Mining Heterogeneous Information Networks: Principles and Methodologies, Morgan & Claypool, 2012.
  • H.J. Watson, B. H. Wixom, “The Current State of Business Intelligence,” IEEE Computer, vol. 40, issue 9, pp. 96-99, September 2007.
  • I. Witten, E. Frank, Μ. Hall, Data Mining: Practical Machine Learning Tools and Techniques (4th edition), Morgan Kaufmann, 2019.
  • M. d’Aquin, G. Kronberger, M. Suárez-Figueroa, “Combining Data Mining and Ontology Engineering to enrich Ontologies and Linked Data”, Proceedings of 1st International Workshop on Knowledge Discovery and Data Mining Meets Linked Open Data, Heraklion, 2012.
  • W. Fan, A. Bifet, “Mining Big Data: Current Status, and Forecast to the Future,” SIGKDD Explorations, vol.14, issue 2, 2014.
  • T. Heath, C. Bizer, Linked Data: Evolving the Web into a Global Data Space, Morgan & Claypool, 2011.
  • L. Palathingal, S. Dascalu, F. C. Harris Jr, Y. Varol, “A Brief Survey of Data Curation Literature”, Proceedings of CATA 2015, Honolulu, Hawaii, March 2015,
  • H. Paulheim, “Exploiting Linked Open Data as Background Knowledge in Data Mining”, Proceedings of International Workshop on Data Mining on Linked Data (DMoLD), Prague, 2013.
  • J.P. Shim, M. Warkentin, J.F. Courtney, D.J. Power, R. Sharda, Ch. Carlsson, “Past, Present and Future of Decision Support Technology,” Decision Support Systems: Directions for the Next Decade, vol.33, issue 2, pp. 111-126, June 2002.
  • X. Wu et al., “Top 10 Algorithms in Data Mining,” Knowledge and Information Systems, vol. 14, pp. 1-37, 2008.
  • E. Turban, R. Sharda, D. Delen, D. King, Business Intelligence: A Managerial Approach (2nd Edition), Prentice Hall, 2011.
  • M. North, Data Mining for the Masses with implementations in RapidMiner and R, 2016.
  • H.J. Watson, B. H. Wixom, “The Current State of Business Intelligence,” IEEE Computer, vol. 40, issue 9, , pp 96-99, September 2007.