-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
We determined who has more young blood than the rest. We also determined which is has more experienced members
- Loading branch information
1 parent
ae6c897
commit cd40a3e
Showing
3 changed files
with
155 additions
and
115 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
Engineer_ID | ||
Field min median max | ||
الهندسة الصناعية والكيميائية والبترولية 2189 47467.0 60152 | ||
اختصاصات متفرقة 7596 44171.0 60035 | ||
الهندسة الميكانيكية 1177 38971.0 60163 | ||
الهندسة الكهربائية 844 37240.0 60162 | ||
الهندسة المدنية 14 34937.5 60164 | ||
الهندسة المعمارية 444 34088.5 60161 | ||
الهندسة الزراعية 1850 30970.0 60113 | ||
هندسة المناجم والتعدين والهندسة الجيولوجية 3752 21851.0 54811 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,213 +1,238 @@ | ||
# TL; DR (too long didn't read) | ||
|
||
The CSV with every engineer registered in Lebanon | ||
https://github.com/EbrahimKaram/LebaneseEngineers/blob/master/Data/all_engineers.csv | ||
<https://github.com/EbrahimKaram/LebaneseEngineers/blob/master/Data/all_engineers.csv> | ||
|
||
Early Data Analytics can be found here | ||
<https://github.com/EbrahimKaram/LebaneseEngineers#quick-answers> | ||
|
||
Read on if you want to know about process and how it was done | ||
|
||
# How to get Every Engineer in Lebanon | ||
There is a website that allows to search the directory for engineers | ||
|
||
https://www.oea.org.lb/Arabic/MembersSearch.aspx?pageid=112 | ||
There is a website that allows to search the directory for engineers | ||
|
||
<https://www.oea.org.lb/Arabic/MembersSearch.aspx?pageid=112> | ||
|
||
now if just search we will get ability to download the excel but it doesn't have the latin names. You can check the excel they provide `OEA-All-Members.xlsx`. This is not what I like and is incomplete in my opinion. We can scrap the directory website and get what we need | ||
now if you just search we will get ability to download the excel but it doesn't have the Latin names. You can check the excel they provide `OEA-All-Members.xlsx`. This is not what I like and is incomplete in my opinion. We can scrap the directory website and get what we need | ||
|
||
We want to have a database of Latin names to Arabic names. It would be useful to train a model for later for Arabic to English or the other way around. | ||
|
||
|
||
## Let's see what the actual request is | ||
|
||
We open Developer tools and monitor the network and see what requests are being done when we click on search. | ||
|
||
We can see that the page is sending the following request | ||
``` | ||
https://www.oea.org.lb/Arabic/GetMembers.aspx?PageID=112&CurrPage=1&fstname=&lstname=&fatname=&numb=&spec=-1&spec1=-1&searchoption=And&rand=0.9449476735976416 | ||
PageID: 112 | ||
CurrPage: 1 | ||
fstname: | ||
lstname: | ||
fatname: | ||
numb: | ||
spec: -1 | ||
spec1: -1 | ||
searchoption: And | ||
rand: 0.9449476735976416 | ||
``` | ||
|
||
https://www.oea.org.lb/Arabic/GetMembers.aspx?PageID=112&CurrPage=1&fstname=&lstname=&fatname=&numb=&spec=-1&spec1=-1&searchoption=And&rand=0.9449476735976416 | ||
|
||
PageID: 112 | ||
CurrPage: 1 | ||
fstname: | ||
lstname: | ||
fatname: | ||
numb: | ||
spec: -1 | ||
spec1: -1 | ||
searchoption: And | ||
rand: 0.9449476735976416 | ||
|
||
If we plug that link into Google Chrome we can get the list of the first 20 names and it looks like this | ||
``` | ||
رقم المهندس: 14 | ||
الاسم: يحيى أحمد مزبودي | ||
Latin Name: Yehia Ahmad Mazboudi | ||
التفاصيل (link to more details) | ||
``` | ||
|
||
رقم المهندس: 14 | ||
الاسم: يحيى أحمد مزبودي | ||
Latin Name: Yehia Ahmad Mazboudi | ||
التفاصيل (link to more details) | ||
|
||
What happens when you press the next | ||
|
||
``` | ||
https://www.oea.org.lb/Arabic/GetMembers.aspx?PageID=112&CurrPage=3&fstname=&lstname=&fatname=&numb=&spec=-1&spec1=-1&searchoption=And&rand=0.055286690143709905 | ||
PageID: 112 | ||
CurrPage: 3 | ||
fstname: | ||
lstname: | ||
fatname: | ||
numb: | ||
spec: -1 | ||
spec1: -1 | ||
searchoption: And | ||
rand: 0.055286690143709905 | ||
``` | ||
https://www.oea.org.lb/Arabic/GetMembers.aspx?PageID=112&CurrPage=3&fstname=&lstname=&fatname=&numb=&spec=-1&spec1=-1&searchoption=And&rand=0.055286690143709905 | ||
|
||
PageID: 112 | ||
CurrPage: 3 | ||
fstname: | ||
lstname: | ||
fatname: | ||
numb: | ||
spec: -1 | ||
spec1: -1 | ||
searchoption: And | ||
rand: 0.055286690143709905 | ||
|
||
Rand value changes but the curr page also changes which indicates the pagination. We can't change that to -1 then we have an invalid request. | ||
Rand doesn't seem to be doing much could be a security issue. | ||
We notice that currPage starts at 1 instead of zero | ||
|
||
*What happens when we over increment currPage?* | ||
_What happens when we over increment currPage?_ | ||
|
||
We get the following response | ||
|
||
<div id="hiddenNoMore" class="noResDiv">لا يوجد أي نتيجة</div> | ||
|
||
These are Get Requests so we can do them from the browser and we don't need to use something like PostMan to test. | ||
|
||
# Missing info | ||
|
||
Now it seems that the database has fields that are requested but never shown. We can search by subfield and field but those are not reiterated in the results and are not provided in the excel that is easily downloadable. | ||
|
||
They call them here the following `نوع الاختصاص ` and 'حقل الاختصاص' | ||
They call them here the following `نوع الاختصاص` and 'حقل الاختصاص' | ||
|
||
Those would go into the spec and spec1 field of the get requests | ||
|
||
We are gonna call them accordingly | ||
field and subfield | ||
`نوع الاختصاص ` and `حقل الاختصاص` | ||
`نوع الاختصاص` and `حقل الاختصاص` | ||
We should probably start in getting IDs for the fields, their respective subfields, and the subfield IDs. (IDs are what the requests use) | ||
|
||
Requests | Arabic | Ours | ||
--|---|-- | ||
spec | نوع الاختصاص | fields | ||
spec1 | حقل الاختصاص | subfields | ||
|
||
| Requests | Arabic | Ours | | ||
| -------- | ------------ | --------- | | ||
| spec | نوع الاختصاص | fields | | ||
| spec1 | حقل الاختصاص | subfields | | ||
|
||
**Tech Tip** | ||
Please note that UTF-8 is not the default for excel. You need to change it following the link below | ||
https://techcommunity.microsoft.com/t5/excel/open-and-edit-a-csv-file-in-utf8/m-p/1035542 | ||
<https://techcommunity.microsoft.com/t5/excel/open-and-edit-a-csv-file-in-utf8/m-p/1035542> | ||
|
||
What we notice is that when we pick a field we don't get a list of subfields to choose from. They stay the same. | ||
|
||
The subfields are the same for all fields. This might be due to a time issue with implementation or just lazy implementation. This is just odd because the screen does load when you pick a field. | ||
|
||
## What can we do | ||
|
||
Let's see what happens when try an unrelated field with a subfield. | ||
We will get the following response | ||
|
||
``` | ||
<div id="hiddenNoMore" class="noResDiv">لا يوجد أي نتيجة</div> | ||
``` | ||
<div id="hiddenNoMore" class="noResDiv">لا يوجد أي نتيجة</div> | ||
|
||
So maybe we should try all possible combinations and see what happens | ||
we have 63 subfields and 10 fields. We have a total of 630 permutations to try. | ||
|
||
### What we ended up with | ||
|
||
We got 62 subfields and we now know which subfields are under which fields. | ||
You can look into how that was done by checking | ||
`GetTheFieldsAndSubfields.py` | ||
|
||
The data is in the folder mentioned `Categories` | ||
|
||
# Building a database | ||
|
||
The ideal scenario is having a database with the following | ||
* Field | ||
* Subfield | ||
* Arabic Name | ||
* Latin Name | ||
* Engineer ID | ||
* Link to extra info for that individual on the order of engineers site | ||
|
||
- Field | ||
- Subfield | ||
- Arabic Name | ||
- Latin Name | ||
- Engineer ID | ||
- Link to extra info for that individual on the order of engineers site | ||
|
||
You can look at the `pullingTheDBv0.8.py` code to see how that was done. We put them into separate CSVs simply not to repeat the entire process if something broke midway. Small steps towards the bigger goal is preferred over a giant leap. | ||
|
||
We know need to merge all that data into one CSV so it's easier to analyze. You can look at `mergeAllFiles.py` for the details on how that was done. | ||
|
||
## Quick answers | ||
***How many engineers are registered as of February 6,2021?*** | ||
|
||
**_How many engineers are registered as of February 6,2021?_** | ||
|
||
65,949 | ||
Please note that the excel only mentions that we have 50,725 engineers. There might be duplicates in our file. We will check this now. | ||
So apparently we have engineers that specialize in more than one field. There are 15002 engineers that specialize in more than one subfield. | ||
|
||
***What are the 3 most popular subfields*** | ||
**_What are the 3 most popular subfields_** | ||
|
||
Field | Subfield | Number | ||
--|---|-- | ||
الهندسة الكهربائية| الهندسة الكهربائية | 10566 | ||
الهندسة المدنية | الهندسة المدنية | 7055 | ||
الهندسة المدنية|مدني-عام | 6844 | ||
| Field | Subfield | Number | | ||
| ------------------ | ------------------ | ------ | | ||
| الهندسة الكهربائية | الهندسة الكهربائية | 10566 | | ||
| الهندسة المدنية | الهندسة المدنية | 7055 | | ||
| الهندسة المدنية | مدني-عام | 6844 | | ||
|
||
_**What are the most popular fields?**_ | ||
|
||
Field | Number | ||
--|-- | ||
الهندسة الكهربائية | 22035 | ||
الهندسة المدنية | 17616 | ||
الهندسة المعمارية | 12028 | ||
الهندسة الميكانيكية | 9618 | ||
الهندسة الزراعية | 3102 | ||
الهندسة الصناعية والكيميائية والبترولية | 1302 | ||
اختصاصات متفرقة | 225 | ||
هندسة المناجم والتعدين والهندسة الجيولوجية | 23 | ||
| Field | Number | | ||
| ------------------------------------------ | ------ | | ||
| الهندسة الكهربائية | 22035 | | ||
| الهندسة المدنية | 17616 | | ||
| الهندسة المعمارية | 12028 | | ||
| الهندسة الميكانيكية | 9618 | | ||
| الهندسة الزراعية | 3102 | | ||
| الهندسة الصناعية والكيميائية والبترولية | 1302 | | ||
| اختصاصات متفرقة | 225 | | ||
| هندسة المناجم والتعدين والهندسة الجيولوجية | 23 | | ||
|
||
_**Which Field has the younger engineers?**_ | ||
|
||
ID's are given incrementally. New memebers have bigger ID numbers. | ||
ID's are given incrementally. New members have bigger ID numbers than old members | ||
|
||
Field | Average ID | ||
--|-- | ||
الهندسة الصناعية والكيميائية والبترولية | 42830.877880 | ||
اختصاصات متفرقة | 39955.746667 | ||
الهندسة الميكانيكية | 38171.511957 | ||
الهندسة الكهربائية | 36900.069027 | ||
الهندسة المعمارية | 33925.649734 | ||
الهندسة الزراعية | 32928.924242 | ||
الهندسة المدنية | 32895.640327 | ||
هندسة المناجم والتعدين والهندسة الجيولوجية | 23856.130435 | ||
| Field | Average ID | | ||
| ------------------------------------------ | ------------ | | ||
| الهندسة الصناعية والكيميائية والبترولية | 42830.877880 | | ||
| اختصاصات متفرقة | 39955.746667 | | ||
| الهندسة الميكانيكية | 38171.511957 | | ||
| الهندسة الكهربائية | 36900.069027 | | ||
| الهندسة المعمارية | 33925.649734 | | ||
| الهندسة الزراعية | 32928.924242 | | ||
| الهندسة المدنية | 32895.640327 | | ||
| هندسة المناجم والتعدين والهندسة الجيولوجية | 23856.130435 | | ||
|
||
Chemical engineering seems to have more recent members than old members. Civil engineering has more experienced engineers. | ||
|
||
Now looking at the median ID. We know where that 50% mark is exactly. It could be a better indicator than average. | ||
|
||
Agriculture needs some fresh blood. | ||
|
||
Field | Median ID | ||
--|-- | ||
الهندسة الصناعية والكيميائية والبترولية | 47467.0 | ||
اختصاصات متفرقة | 44171.0 | ||
الهندسة الميكانيكية | 38971.0 | ||
الهندسة الكهربائية | 37240.0 | ||
الهندسة المدنية | 34937.5 | ||
الهندسة المعمارية | 34088.5 | ||
الهندسة الزراعية | 30970.0 | ||
هندسة المناجم والتعدين والهندسة الجيولوجية | 21851.0 | ||
Now looking at the median ID. We know where that 50% mark is exactly. It could be a better indicator than average. | ||
|
||
| Field | Median ID | | ||
| ------------------------------------------ | --------- | | ||
| الهندسة الصناعية والكيميائية والبترولية | 47467.0 | | ||
| اختصاصات متفرقة | 44171.0 | | ||
| الهندسة الميكانيكية | 38971.0 | | ||
| الهندسة الكهربائية | 37240.0 | | ||
| الهندسة المدنية | 34937.5 | | ||
| الهندسة المعمارية | 34088.5 | | ||
| الهندسة الزراعية | 30970.0 | | ||
| هندسة المناجم والتعدين والهندسة الجيولوجية | 21851.0 | | ||
|
||
It would seem that agriculture would need some fresh blood. | ||
|
||
_**Which field has the earliest and latest members?**_ | ||
|
||
In a way I'm asking what the max and min ID are in each field. This would indicate in a sense the earliest and latest members | ||
|
||
| | Engineer_ID | | | | ||
| ------------------------------------------ | ----------- | ------- | ----- | | ||
| Field | min | median | max | | ||
| الهندسة الصناعية والكيميائية والبترولية | 2189 | 47467 | 60152 | | ||
| اختصاصات متفرقة | 7596 | 44171 | 60035 | | ||
| الهندسة الميكانيكية | 1177 | 38971 | 60163 | | ||
| الهندسة الكهربائية | 844 | 37240 | 60162 | | ||
| الهندسة المدنية | 14 | 34937.5 | 60164 | | ||
| الهندسة المعمارية | 444 | 34088.5 | 60161 | | ||
| الهندسة الزراعية | 1850 | 30970 | 60113 | | ||
| هندسة المناجم والتعدين والهندسة الجيولوجية | 3752 | 21851 | 54811 | | ||
|
||
# Future Prospects for this project | ||
|
||
This allows for multiple projects in Machine learning and Data analysis. | ||
Some ideas for Machine learning: | ||
* Machine learning to write any Latin name in Arabic | ||
* From your name what is likelihood you will become an engineer | ||
* Arabic to Latin training | ||
* etc... | ||
|
||
- Machine learning to write any Latin name in Arabic | ||
- From your name what is likelihood you will become an engineer | ||
- Arabic to Latin training | ||
- etc... | ||
|
||
Some ideas for Data analysis | ||
* What is the most dominant last name in every engineering Discipline | ||
* How many people are in each Discipline | ||
* Answered Above | ||
* Which discipline is the least active (not a lot of new IDs) | ||
* This can be done by checking the average ID. IDs are given sequentially. New members get bigger IDs | ||
* Answered Above | ||
* A range of Age | ||
* What is the smallest ID and largest ID. A indicator of membership age. who is an old member. WHo is a new member | ||
* etc... | ||
|
||
- What is the most dominant last name in every engineering Discipline | ||
- How many people are in each Discipline | ||
- Answered Above | ||
- Which discipline is the least active (not a lot of new IDs) | ||
- This can be done by checking the average ID. IDs are given sequentially. New members get bigger IDs | ||
- Answered Above | ||
- A range of Age | ||
- What is the smallest ID and largest ID. A indicator of membership age. who is an old member. WHo is a new member | ||
- etc... | ||
|
||
Please download the complete CSV from [here](https://github.com/EbrahimKaram/LebaneseEngineers/blob/master/Data/all_engineers.csv) | ||
|
||
# Support | ||
|
||
If you liked this project and found it useful, I would really appreciate your support by buying me a drink via the link below | ||
|
||
https://www.buymeacoffee.com/bobKaram | ||
<https://www.buymeacoffee.com/bobKaram> |