-
Notifications
You must be signed in to change notification settings - Fork 1
/
notes
34 lines (24 loc) · 2.31 KB
/
notes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
still need to:
-automate grabbing the text from the NYT site?
-fix regex: mostly working; still doesn't do well on initials combined with end-of-name punctuation
-think through how handling author duplications - if author name doesn't match it adds a duplicate entry in GRdata. maybe this is good - have a record of all mappings between name-as-mentioned and GR name. - currently handling this by deduping once create the btb object
-more visualizations
-figure out Shiny?
-Who isn't getting GR hits? -mis-spellings, other issues
-need to account for case when GR search returns multiple authors with same name (didn't fix this yet, just manually changed the one that was causing the error)
Solved
-why is retrieval of stored xml objects crashing things - something about the pointers not pointing to the right place because of specification of environment? They are being created within the loop, I am using assign to create them in the global environment, but something is going awry
-they exist and take up a tiny bit of space but pointers don't work. there's no data in them. (an empty list with the same characteristics is the same size) Need to redo all of it, figure out how to actually get the data stored.
(x)removed all unstable xml objects (searchgr and authgr)
-looks like just converting to character and then back again works - changed authdetails function to do that
xadd in single data - done
xstill need to edit "add new data" code so that single data will be added with new interviewees - done
xfigure out character encoding
files pasted from web are UTF-8 (can see this at the bottom of the file in WordWrangler)
files created by python script are showing in WW as "Western (Mac OS Roman)"
If change this manually to UTF-8 and then specify encoding = "UTF-8" in read.csv, it works
need to figure out how to change the python script so they save as UTF-8
**gave up on this, doing it all in R which handles encoding better
xneed to go in and clean up mismatches - flagged as matchOK==F - added a process for that
xneed to account for case when GR search returns nothing - added line to authdetails function setting modalauthname to NA - not working??
xauthdetails function choking on "E.O. Wilson" for some reason. Also need to figure out which is better, spaces or no spaces. Also how to work through a loop with input.