Course description

Title of the Teaching Unit

Data Science and IA

Code of the Teaching Unit

21MQ061

Academic year

2025 - 2026

Cycle

Number of credits

Number of hours

Quarter

Weighting

Site

Montgomery

Teaching language

French

Teacher in charge

DENDONCKER Valentin

Objectives and contribution to the program

This course is an introduction to quantitative techniques for exploring and interpreting data, as well as data preparation techniques, with a view to using them in a project involving machine learning or artificial intelligence algorithms.
By the end of the course, students will be able to choose and apply a quantitative technique that will enable them to answer a question based on existing data.

Prerequisites and corequisites

Content

The course will cover the following topics:
- Data mining
- Data cleaning
- Data transformation

Teaching methods

Lectures combining descriptions of theoretical foundations, illustrations, and implementation of the concepts covered.

Assessment method

The written exam (with no reference materials allowed) consists of multiple-choice and multiple-response questions, as well as possible open-ended questions.

References

- Borg, I., & Groenen, P. J. (2005). Modern multidimensional scaling: Theory and applications. Springer Science & Business Media.
- Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and Regression Trees. Wadsworth Statistics/Probability Series.
- Brick, J. M. (2013). Unit nonresponse and weighting adjustments: A critical review. Journal of Official Statistics, 29(3), 329-353.
-Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM computing surveys (CSUR), 41(3), 1-58.
- Davis, J. J., & Foo, E. (2016). Automated feature engineering for HTTP tunnel detection. Computers & Security, 59, 166-185.
- Gu, Q., Li, Z., & Han, J. (2011, September). Linear discriminant dimensionality reduction. In Joint European conference on machine learning and knowledge discovery in databases (pp. 549-564). Springer, Berlin, Heidelberg.
- He, Z., Xu, X., & Deng, S. (2003). Discovering cluster-based local outliers. Pattern Recognition Letters, 24(9-10), 1641-1650.
- Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313 (5786), 504-507.
- Hoffmann, H. (2007). Kernel PCA for novelty detection. Pattern recognition, 40(3), 863-874.
- Holt, D., & Elliot, D. (1991). Methods of weighting for unit non-response. Journal of the Royal Statistical Society: Series D (The Statistician), 40(3), 333-342.
- Jolliffe, I. T., & Cadima, J. (2016). Principal component analysis: a review and recent developments. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 374(2065).
- Kaul, A., Maheshwary, S., & Pudi, V. (2017, November). Autolearn—Automated feature generation and selection. In 2017 IEEE International Conference on data mining (ICDM) (pp. 217-226). IEEE.
- Kuhn, M., & Johnson, K. (2019). Feature engineering and selection: A practical approach for predictive models. CRC Press.
- Kohonen, T. (2013). Essentials of the self-organizing map. Neural networks, 37, 52-65.
- Little, A., & Rubin, D. B. (2002). Statistical Analysis with Missing Data (2nd Edition). John Wiley & Sons, Inc.
- Liu, F. T., Ting, K. M., & Zhou, Z. H. (2008, December). Isolation forest. In 2008 Eighth IEEE International Conference on Data Mining (pp. 413-422). IEEE.
- Osborne, J. W. (2013). Best practices in data cleaning: A complete guide to everything you need to do before and after collecting your data. SAGE Publications, Inc.
- Osier, G. (2016). Unit non-response in household wealth surveys: Experience from the Eurosystem's Household Finance and Consumption Survey (No. 15). ECB Statistics Paper.
- Rokach, L., & Maimon, O. Z. (2008). Data mining with decision trees: theory and applications (Vol. 69). World scientific.
- Rosipal, R., Girolami, M., Trejo, L. J., & Cichocki, A. (2001). Kernel PCA for feature extraction and de-noising in nonlinear regression. Neural Computing & Applications, 10(3), 231-243.
- Sammon, J. W. (1969). A nonlinear mapping for data structure analysis. IEEE Transactions on computers, 100(5), 401-409.
- Van Buuren, S. (2018). Flexible Imputation of Missing Data (2nd Edition). Chapman and Hall/CRC Press.
- Van Der Maaten, L., Postma, E., & Van den Herik, J. (2009). Dimensionality reduction: a comparative review. Journal of Machine Learning Research, 10(66-71), 13.