Témata prací (Výběr práce)Témata prací (Výběr práce)(verze: 368)
Detail práce
   Přihlásit přes CAS
Application of machine learning methods for estimating apartment prices in the Czech Republic
Název práce v češtině: Aplikace metod strojového učení pro odhad cen bytů v České republice
Název v anglickém jazyce: Application of machine learning methods for estimating apartment prices in the Czech Republic
Akademický rok vypsání: 2017/2018
Typ práce: diplomová práce
Jazyk práce: angličtina
Ústav: Institut ekonomických studií (23-IES)
Vedoucí / školitel: prof. PhDr. Ladislav Krištoufek, Ph.D.
Řešitel: skrytý - zadáno vedoucím/školitelem
Datum přihlášení: 13.06.2018
Datum zadání: 13.06.2018
Datum a čas obhajoby: 16.09.2019 09:00
Místo konání obhajoby: Opletalova - Opletalova 26, O206, Opletalova - místn. č. 206
Datum odevzdání elektronické podoby:30.07.2019
Datum proběhlé obhajoby: 16.09.2019
Oponenti: doc. PhDr. Jozef Baruník, Ph.D.
 
 
 
Kontrola URKUND:
Seznam odborné literatury
1. Abdallah, S.& D. A. Khasha(2016): \Using Text Mining To Analyze Real Estate Classifieds." International Conference on Advanced Intelligent Systems and Informatics: pp. 193-202.
2. Eh, M., M. Kilibarda, A. Lisec & B. Bajat (2018): \Estimating the Performance of Random Forest versus Multiple Regression for Predicting Prices of the Apartments." ISPRS International Journal of Geo-Information 7(5): pp. 168
3. Goldberg, Y. (2017): \Neural Network Methods for Natural Language Processing." Synthesis Lectures on Human Language Technologies 10(1): pp. 1-309.
4. Manjula, R., S. Jain, S. Srivastava, Kher (1996): \Real estate value prediction using multivariate regression models." IOP Conference Series: Materials Science and Engineering 263(4): pp. 141{53.
5. Nejad, M. Z., J. Lu, V. Behbood, Kher (2017): \Applying dynamic Bayesian tree in property sales price estimation." International Conference on Intelligent Systems and Knowledge Engineering (ISKE) 12.
6. Park, B. & J. K. Bae (2015): \Using machine learning algorithms for housing price prediction: The case of Fairfax County, Virginia housing data." Expert Systems with Applications 42(6): pp. 2928{2934.
7. Stevens, D. (2014): \Predicting Real Estate Price Using Text Mining, Automated Real Estate Description Analysis." Tilburg University School of Humanities.
8. Witten, I. H. (2017): \Data mining: practical machine learning tools and techniques." Elsevier.
Předběžná náplň práce
Proposed Topic: The thesis will focus on various models for the estimation of property prices in the Czech Republic. We will cover apartments only, as the available data are detailed and the size is adequate for our needs. The analysis will be performed on the cross-sections obtained by main real estate sites in the Czech market. The expected sample size should be more than 10 000 observations, which should provide sufficient robustness of our later conclusions. Moreover, large-scale datasets are frequently critical conditions for the application of machine learning methods.
Apartment offer prices are publicly known; thus, it is not expected any dramatic deviations to appear. Models should be able to precisely forecast the price for every new offer based on historical evidence. Finally, models should enable to detect mispriced properties. The output of the empirical research will be compared to the results of similar works.
The aim of this work will be not only to determine the most relevant parameters for the estimation of properties market price but to find the approach which provides the most accurate prediction as well. The final discussion will cover the potential replacement of the comparative method used in everyday practice.

Hypotheses:
1) Machine learning methods provide more accurate estimations of apartments' prices.
2) Advertising text descriptions have a significant impact on the properties' offering prices.
3) The set of independent variables for apartments' price determination is different in Prague and the rest of the Czech Republic.

Methodology:
In the thesis, we will use two methods for the price estimation. Firstly, the linear regression will be employed. Despite the fact that these approaches are not used for expert evidence in the common practice, there have been numerous papers regarding linear models published. The following part will be dedicated to machine learning techniques, as their popularity has grown in recent years. We will apply the least absolute shrinkage and selection operator (LASSO), decision tree, random forests and nearest neighborhood methods. The extensive empirical analysis will be the main component of the paper, and a combination of different approaches should shed light on the Czech apartments market.
The expectation is that a substantial part of the apartment price is determined by factors that cannot be easily quantified. On that account, w will use a refinement of models achieved by text mining. To support or reject our hypothesis, we will create limited models with a focus on Prague. We believe that the characteristics of the Czech capital's market will be different than in the rest of the country.

Expected Contribution:
In-depth analysis of the description context is a relatively new approach and not frequently used. Hence NLP should provide significant added value to the economic research. Moreover, the text will be predominantly in the Czech language. Due to complicated grammar, the analysis will be more complex than in English written papers. Furthermore, this approach will be used for the first time in housing prices analysis in the Czech market.
From a practical perspective, the detection of mispricing by models opens the opportunity for investors to find the best pick. As even a small deviation from the correct price can turn into an outstanding deal. Text analysis should unhide gems by automated data processing on a periodical basis. Additionally, models should take into consideration the impact of large cities on the price, which is not always used in similar works.

Outline:
1) Introduction
2) Theoretical Background
3) Literature Review
4) Data
5) Methodology
a) Application of Text Mining
b) Conventional Econometric Estimation Methods
c) Machine Learning Methods
6) Results and Model Comparison
7) Conclusion
Předběžná náplň práce v anglickém jazyce
Proposed Topic: The thesis will focus on various models for the estimation of property prices in the Czech Republic. We will cover apartments only, as the available data are detailed and the size is adequate for our needs. The analysis will be performed on the cross-sections obtained by main real estate sites in the Czech market. The expected sample size should be more than 10 000 observations, which should provide sufficient robustness of our later conclusions. Moreover, large-scale datasets are frequently critical conditions for the application of machine learning methods.
Apartment offer prices are publicly known; thus, it is not expected any dramatic deviations to appear. Models should be able to precisely forecast the price for every new offer based on historical evidence. Finally, models should enable to detect mispriced properties. The output of the empirical research will be compared to the results of similar works.
The aim of this work will be not only to determine the most relevant parameters for the estimation of properties market price but to find the approach which provides the most accurate prediction as well. The final discussion will cover the potential replacement of the comparative method used in everyday practice.

Hypotheses:
1) Machine learning methods provide more accurate estimations of apartments' prices.
2) Advertising text descriptions have a significant impact on the properties' offering prices.
3) The set of independent variables for apartments' price determination is different in Prague and the rest of the Czech Republic.

Methodology:
In the thesis, we will use two methods for the price estimation. Firstly, the linear regression will be employed. Despite the fact that these approaches are not used for expert evidence in the common practice, there have been numerous papers regarding linear models published. The following part will be dedicated to machine learning techniques, as their popularity has grown in recent years. We will apply the least absolute shrinkage and selection operator (LASSO), decision tree, random forests and nearest neighborhood methods. The extensive empirical analysis will be the main component of the paper, and a combination of different approaches should shed light on the Czech apartments market.
The expectation is that a substantial part of the apartment price is determined by factors that cannot be easily quantified. On that account, w will use a refinement of models achieved by text mining. To support or reject our hypothesis, we will create limited models with a focus on Prague. We believe that the characteristics of the Czech capital's market will be different than in the rest of the country.

Expected Contribution:
In-depth analysis of the description context is a relatively new approach and not frequently used. Hence NLP should provide significant added value to the economic research. Moreover, the text will be predominantly in the Czech language. Due to complicated grammar, the analysis will be more complex than in English written papers. Furthermore, this approach will be used for the first time in housing prices analysis in the Czech market.
From a practical perspective, the detection of mispricing by models opens the opportunity for investors to find the best pick. As even a small deviation from the correct price can turn into an outstanding deal. Text analysis should unhide gems by automated data processing on a periodical basis. Additionally, models should take into consideration the impact of large cities on the price, which is not always used in similar works.

Outline:
1) Introduction
2) Theoretical Background
3) Literature Review
4) Data
5) Methodology
a) Application of Text Mining
b) Conventional Econometric Estimation Methods
c) Machine Learning Methods
6) Results and Model Comparison
7) Conclusion
 
Univerzita Karlova | Informační systém UK