Course description
Title of the Teaching Unit
Data Science and IA
Code of the Teaching Unit
21MQ061
Academic year
2025 - 2026
Cycle
Number of credits
5
Number of hours
60
Quarter
1
Weighting
Site
Montgomery
Teaching language
French
Teacher in charge
DENDONCKER Valentin
Objectives and contribution to the program
This course is an introduction to quantitative techniques for exploring and interpreting data, as well as data preparation techniques, with a view to using them in a project involving machine learning or artificial intelligence algorithms.
By the end of the course, students will be able to choose and apply a quantitative technique that will enable them to answer a question based on existing data.
Prerequisites and corequisites
Content
The course will cover the following topics:
- Data mining
- Data cleaning
- Data transformation
Teaching methods
Lectures combining descriptions of theoretical foundations, illustrations, and implementation of the concepts covered.
Assessment method
The written exam (with no reference materials allowed) consists of multiple-choice and multiple-response questions, as well as possible open-ended questions.
References
- Borg, I., & Groenen, P. J. (2005). Modern multidimensional scaling: Theory and applications. Springer Science & Business Media.
- Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and Regression Trees. Wadsworth Statistics/Probability Series.
- Brick, J. M. (2013). Unit nonresponse and weighting adjustments: A critical review. Journal of Official Statistics, 29(3), 329-353.
-Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM computing surveys (CSUR), 41(3), 1-58.
- Davis, J. J., & Foo, E. (2016). Automated feature engineering for HTTP tunnel detection. Computers & Security, 59, 166-185.
- Gu, Q., Li, Z., & Han, J. (2011, September). Linear discriminant dimensionality reduction. In Joint European conference on machine learning and knowledge discovery in databases (pp. 549-564). Springer, Berlin, Heidelberg.
- He, Z., Xu, X., & Deng, S. (2003). Discovering cluster-based local outliers. Pattern Recognition Letters, 24(9-10), 1641-1650.
- Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313 (5786), 504-507.
- Hoffmann, H. (2007). Kernel PCA for novelty detection. Pattern recognition, 40(3), 863-874.
- Holt, D., & Elliot, D. (1991). Methods of weighting for unit non-response. Journal of the Royal Statistical Society: Series D (The Statistician), 40(3), 333-342.
- Jolliffe, I. T., & Cadima, J. (2016). Principal component analysis: a review and recent developments. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 374(2065).
- Kaul, A., Maheshwary, S., & Pudi, V. (2017, November). Autolearn—Automated feature generation and selection. In 2017 IEEE International Conference on data mining (ICDM) (pp. 217-226). IEEE.
- Kuhn, M., & Johnson, K. (2019). Feature engineering and selection: A practical approach for predictive models. CRC Press.
- Kohonen, T. (2013). Essentials of the self-organizing map. Neural networks, 37, 52-65.
- Little, A., & Rubin, D. B. (2002). Statistical Analysis with Missing Data (2nd Edition). John Wiley & Sons, Inc.
- Liu, F. T., Ting, K. M., & Zhou, Z. H. (2008, December). Isolation forest. In 2008 Eighth IEEE International Conference on Data Mining (pp. 413-422). IEEE.
- Osborne, J. W. (2013). Best practices in data cleaning: A complete guide to everything you need to do before and after collecting your data. SAGE Publications, Inc.
- Osier, G. (2016). Unit non-response in household wealth surveys: Experience from the Eurosystem's Household Finance and Consumption Survey (No. 15). ECB Statistics Paper.
- Rokach, L., & Maimon, O. Z. (2008). Data mining with decision trees: theory and applications (Vol. 69). World scientific.
- Rosipal, R., Girolami, M., Trejo, L. J., & Cichocki, A. (2001). Kernel PCA for feature extraction and de-noising in nonlinear regression. Neural Computing & Applications, 10(3), 231-243.
- Sammon, J. W. (1969). A nonlinear mapping for data structure analysis. IEEE Transactions on computers, 100(5), 401-409.
- Van Buuren, S. (2018). Flexible Imputation of Missing Data (2nd Edition). Chapman and Hall/CRC Press.
- Van Der Maaten, L., Postma, E., & Van den Herik, J. (2009). Dimensionality reduction: a comparative review. Journal of Machine Learning Research, 10(66-71), 13.