Undoubling-of-real-estate-advertisement

The main objective of this project is to detect duplicate real estate ads in different avenues (social networks, website, etc.)

We do first clustering the properties using only the texts contained in the "DESCRIPTION" column of the dataframe. Then, for each cluster obtained previously, a micro-cluster is created using only certain columns (feature engineering) excluded from the "DESCRIPTION" column.

I got the following results:

Number of properties without duplicates: 959
Number of properties with duplicates: 357

Restitution of results:

I generated a zip file which contains a set of files useful for the interpretation of the algorithmic results.

The link to the zip is: https://drive.google.com/file/d/1Ng2WbagNHM1y2vE8y9fCztCfLqKW-37k/view?usp=sharing

Description of the files attached in the zip:

"duplicates.csv": It contains the properties present in duplicates. A set of duplicate assets is separated from the others by an empty line in the csv file
"sans_doublons.csv": It contains the properties that are not duplicated
"project.ipynb": python notebook

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Undoubling-of-real-estate-advertisement

The main objective of this project is to detect duplicate real estate ads in different avenues (social networks, website, etc.)

Restitution of results:

Description of the files attached in the zip:

About

Releases

Packages

landrydipanda/Undoubling-of-real-estate-advertisement

Folders and files

Latest commit

History

Repository files navigation

Undoubling-of-real-estate-advertisement

The main objective of this project is to detect duplicate real estate ads in different avenues (social networks, website, etc.)

Restitution of results:

Description of the files attached in the zip:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages