{"status":"public","_id":"3729","user_id":"220548","citation":{"bibtex":"@book{Kösters_Schöne_Kohlhase, title={Benchmarking of Machine Learning Models for Tabular Scarce Data}, author={Kösters, Justus and Schöne, Marvin and Kohlhase, Martin} }","ama":"Kösters J, Schöne M, Kohlhase M. Benchmarking of Machine Learning Models for Tabular Scarce Data.","mla":"Kösters, Justus, et al. Benchmarking of Machine Learning Models for Tabular Scarce Data.","short":"J. Kösters, M. Schöne, M. Kohlhase, Benchmarking of Machine Learning Models for Tabular Scarce Data, n.d.","alphadin":"Kösters, Justus ; Schöne, Marvin ; Kohlhase, Martin: Benchmarking of Machine Learning Models for Tabular Scarce Data","apa":"Kösters, J., Schöne, M., & Kohlhase, M. (n.d.). Benchmarking of Machine Learning Models for Tabular Scarce Data.","chicago":"Kösters, Justus, Marvin Schöne, and Martin Kohlhase. Benchmarking of Machine Learning Models for Tabular Scarce Data, n.d.","ieee":"J. Kösters, M. Schöne, and M. Kohlhase, Benchmarking of Machine Learning Models for Tabular Scarce Data. ."},"date_created":"2023-11-16T13:43:06Z","title":"Benchmarking of Machine Learning Models for Tabular Scarce Data","year":"2023","has_accepted_license":"1","file_date_updated":"2023-11-16T13:42:34Z","file":[{"success":1,"creator":"mschoene","file_size":329931,"file_name":"BenchmarkingMlModelsScarceData_Koesters.pdf","file_id":"3730","access_level":"open_access","content_type":"application/pdf","relation":"main_file","date_created":"2023-11-16T13:42:34Z","date_updated":"2023-11-16T13:42:34Z"}],"publication_status":"submitted","language":[{"iso":"eng"}],"department":[{"_id":"103"}],"date_updated":"2023-11-30T14:34:00Z","abstract":[{"lang":"eng","text":"Due to their high costs and time requirements, companies are interested in minimizing their laboratory experiments during process or product design. For this, machine learning can be used to extract knowledge from the process or product to predict future designs. Due to the high costs and time requirements, data from laboratory experiments are scarce, so only machine learning algorithms with small hypothesis spaces are suitable to predict such data. In this paper, the performance of Linear and Logistic Regression, Decision Trees, Gaussian Processes and Support Vector Machines on respectively five real world datasets for classification and regression is compared. The Decision Trees have the best and Gaussian Processes the worst overall performance, but the Gaussian Processes show a great potential if adequate hyperparameters are selected. For the analyzed data and the chosen hyperparameters, the Gaussian Processes tend to overfit, whereas the Support Vector Machines tend to underfit. Linear and Logistic Regression offer a good tradeoff between complexity and performance, producing results comparable to Decision Trees.\r\n"}],"type":"working_paper","main_file_link":[{"open_access":"1"}],"keyword":["tabular scarce data","industrial design","supervised machine learning models"],"author":[{"full_name":"Kösters, Justus","last_name":"Kösters","first_name":"Justus"},{"last_name":"Schöne","full_name":"Schöne, Marvin","id":"218388","first_name":"Marvin"},{"id":"226669","first_name":"Martin","orcid":"0009-0002-9374-0720","last_name":"Kohlhase","full_name":"Kohlhase, Martin"}],"oa":"1"}