Skip to content

Commit 4966277

Browse files
authored
Merge pull request #389 from cosanlab/fix
Fix
2 parents 3467117 + 7a07a33 commit 4966277

File tree

10 files changed

+393
-201
lines changed

10 files changed

+393
-201
lines changed

examples/01_DataOperations/plot_adjacency.py

Lines changed: 23 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -25,8 +25,10 @@
2525
m1 = block_diag(np.ones((4, 4)), np.zeros((4, 4)), np.zeros((4, 4)))
2626
m2 = block_diag(np.zeros((4, 4)), np.ones((4, 4)), np.zeros((4, 4)))
2727
m3 = block_diag(np.zeros((4, 4)), np.zeros((4, 4)), np.ones((4, 4)))
28-
noisy = (m1*1+m2*2+m3*3) + np.random.randn(12, 12)*.1
29-
dat = Adjacency(noisy, matrix_type='similarity', labels=['C1']*4 + ['C2']*4 + ['C3']*4)
28+
noisy = (m1 * 1 + m2 * 2 + m3 * 3) + np.random.randn(12, 12) * 0.1
29+
dat = Adjacency(
30+
noisy, matrix_type="similarity", labels=["C1"] * 4 + ["C2"] * 4 + ["C3"] * 4
31+
)
3032

3133
#########################################################################
3234
# Basic information about the object can be viewed by simply calling it.
@@ -44,37 +46,39 @@
4446
dat.plot()
4547

4648
#########################################################################
47-
# The mean within a a grouping label can be calculated using the `.within_cluster_mean()` method. You must specify a group variable to group the data. Here we use the labels.
49+
# The mean within a a grouping label can be calculated using the `.cluster_summary()` method. You must specify a group variable to group the data. Here we use the labels.
4850

49-
print(dat.within_cluster_mean(clusters=dat.labels))
51+
print(dat.cluster_summary(clusters=dat.labels, summary="within", metric="mean"))
5052

5153
#########################################################################
5254
# Regression
5355
# ----------
5456
#
5557
# Adjacency objects can currently accommodate two different types of regression. Sometimes we might want to decompose an Adjacency matrix from a linear combination of other Adjacency matrices. Other times we might want to perform a regression at each pixel in a stack of Adjacency matrices. Here we provide an example of each method. We use the same data we generated above, but attempt to decompose it by each block of data. We create the design matrix by simply concatenating the matrices we used to create the data object. The regress method returns a dictionary containing all of the relevant information from the regression. Here we show that the model recovers the average weight in each block.
5658

57-
X = Adjacency([m1, m2, m3], matrix_type='similarity')
59+
X = Adjacency([m1, m2, m3], matrix_type="similarity")
5860
stats = dat.regress(X)
59-
print(stats['beta'])
61+
print(stats["beta"])
6062

6163
#########################################################################
6264
# In addition to decomposing a single adjacency matrix, we can also estimate a model that predicts the variance over each voxel. This is equivalent to a univariate regression in imaging analyses. Remember that just like in imaging these tests are non-independent and may require correcting for multiple comparisons. Here we create some data that varies over matrices and identify pixels that follow a particular on-off-on pattern. We plot the t-values that exceed 2.
6365

6466
from nltools.data import Design_Matrix
6567
import matplotlib.pyplot as plt
6668

67-
data = Adjacency([m1 + np.random.randn(12,12)*.5 for x in range(5)] +
68-
[np.zeros((12, 12)) + np.random.randn(12, 12)*.5 for x in range(5)] +
69-
[m1 + np.random.randn(12, 12)*.5 for x in range(5)])
69+
data = Adjacency(
70+
[m1 + np.random.randn(12, 12) * 0.5 for x in range(5)]
71+
+ [np.zeros((12, 12)) + np.random.randn(12, 12) * 0.5 for x in range(5)]
72+
+ [m1 + np.random.randn(12, 12) * 0.5 for x in range(5)]
73+
)
7074

71-
X = Design_Matrix([1]*5 + [0]*5 + [1]*5)
75+
X = Design_Matrix([1] * 5 + [0] * 5 + [1] * 5)
7276
f = X.plot()
73-
f.set_title('Model', fontsize=18)
77+
f.set_title("Model", fontsize=18)
7478

7579
stats = data.regress(X)
76-
t = stats['t'].plot(vmin=2)
77-
plt.title('Significant Pixels',fontsize=18)
80+
t = stats["t"].plot(vmin=2)
81+
plt.title("Significant Pixels", fontsize=18)
7882

7983
#########################################################################
8084
# Similarity/Distance
@@ -88,22 +92,22 @@
8892
#########################################################################
8993
# We can also calculate the distance between multiple matrices contained within a single Adjacency object. Any distance metric is available from the `sci-kit learn <http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.pairwise_distances.html>`_ by specifying the `method` flag. This outputs an Adjacency matrix. In the example below we see that several matrices are more similar to each other (i.e., when the signal is on). Remember that the nodes here now represent each matrix from the original distance matrix.
9094

91-
dist = data.distance(metric='correlation')
95+
dist = data.distance(metric="correlation")
9296
dist.plot()
9397

9498
#########################################################################
9599
# Similarity matrices can be converted to and from Distance matrices using `.similarity_to_distance()` and `.distance_to_similarity()`.
96100

97-
dist.distance_to_similarity().plot()
101+
dist.distance_to_similarity(metric="correlation").plot()
98102

99103
#########################################################################
100104
# Multidimensional Scaling
101105
# ------------------------
102106
#
103107
# We can perform additional analyses on distance matrices such as multidimensional scaling. Here we provide an example to create a 3D multidimensional scaling plot of our data to see if the on and off matrices might naturally group together.
104108

105-
dist = data.distance(metric='correlation')
106-
dist.labels = ['On']*5 + ['Off']*5 + ['On']*5
109+
dist = data.distance(metric="correlation")
110+
dist.labels = ["On"] * 5 + ["Off"] * 5 + ["On"] * 5
107111
dist.plot_mds(n_components=3)
108112

109113
#########################################################################
@@ -114,9 +118,9 @@
114118

115119
import networkx as nx
116120

117-
dat = Adjacency(m1+m2+m3, matrix_type='similarity')
121+
dat = Adjacency(m1 + m2 + m3, matrix_type="similarity")
118122
g = dat.to_graph()
119123

120-
print('Degree of each node: %s' % g.degree())
124+
print("Degree of each node: %s" % g.degree())
121125

122126
nx.draw_circular(g)

nltools/data/adjacency.py

Lines changed: 121 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@
2525
summarize_bootstrap,
2626
matrix_permutation,
2727
fisher_r_to_z,
28+
fisher_z_to_r,
2829
_calc_pvalue,
2930
_bootstrap_isc,
3031
)
@@ -36,7 +37,6 @@
3637
concatenate,
3738
_bootstrap_apply_func,
3839
_df_meta_to_arr,
39-
isiterable,
4040
)
4141
from .design_matrix import Design_Matrix
4242
from joblib import Parallel, delayed
@@ -114,29 +114,40 @@ def __init__(self, data=None, Y=None, matrix_type=None, labels=[], **kwargs):
114114
self.issymmetric = symmetric_all[0]
115115
self.matrix_type = matrix_type_all[0]
116116
self.is_single_matrix = False
117-
elif (isinstance(data, str) or isinstance(data, Path)) and (
118-
(".h5" in data) or (".hdf5" in data)
119-
):
120-
f = dd.io.load(data)
121-
self.data = f["data"]
122-
self.Y = pd.DataFrame(
123-
f["Y"],
124-
columns=[
125-
e.decode("utf-8") if isinstance(e, bytes) else e
126-
for e in f["Y_columns"]
127-
],
128-
index=[
117+
elif isinstance(data, str) or isinstance(data, Path):
118+
to_load = str(data)
119+
# Data is a string or apth and h5
120+
if (".h5" in to_load) or (".hdf5" in to_load):
121+
f = dd.io.load(data)
122+
self.data = f["data"]
123+
self.Y = pd.DataFrame(
124+
f["Y"],
125+
columns=[
126+
e.decode("utf-8") if isinstance(e, bytes) else e
127+
for e in f["Y_columns"]
128+
],
129+
index=[
130+
e.decode("utf-8") if isinstance(e, bytes) else e
131+
for e in f["Y_index"]
132+
],
133+
)
134+
self.matrix_type = f["matrix_type"]
135+
self.is_single_matrix = f["is_single_matrix"]
136+
self.issymmetric = f["issymmetric"]
137+
self.labels = [
129138
e.decode("utf-8") if isinstance(e, bytes) else e
130-
for e in f["Y_index"]
131-
],
132-
)
133-
self.matrix_type = f["matrix_type"]
134-
self.is_single_matrix = f["is_single_matrix"]
135-
self.issymmetric = f["issymmetric"]
136-
self.labels = [
137-
e.decode("utf-8") if isinstance(e, bytes) else e for e in f["labels"]
138-
]
139-
return
139+
for e in f["labels"]
140+
]
141+
return
142+
# Data is a string or path but not h5
143+
else:
144+
(
145+
self.data,
146+
self.issymmetric,
147+
self.matrix_type,
148+
self.is_single_matrix,
149+
) = self._import_single_data(data, matrix_type=matrix_type)
150+
# Data is not a string or path
140151
else:
141152
(
142153
self.data,
@@ -511,6 +522,30 @@ def mean(self, axis=0):
511522
elif axis == 1:
512523
return np.nanmean(self.data, axis=axis)
513524

525+
def sum(self, axis=0):
526+
"""Calculate sum of Adjacency
527+
528+
Args:
529+
axis: (int) calculate mean over features (0) or data (1).
530+
For data it will be on upper triangle.
531+
532+
Returns:
533+
mean: float if single, adjacency if axis=0, np.array if axis=1
534+
and multiple
535+
536+
"""
537+
538+
if self.is_single_matrix:
539+
return np.nansum(self.data)
540+
else:
541+
if axis == 0:
542+
return Adjacency(
543+
data=np.nansum(self.data, axis=axis),
544+
matrix_type=self.matrix_type + "_flat",
545+
)
546+
elif axis == 1:
547+
return np.nansum(self.data, axis=axis)
548+
514549
def std(self, axis=0):
515550
"""Calculate standard deviation of Adjacency
516551
@@ -752,6 +787,13 @@ def r_to_z(self):
752787
out.data = fisher_r_to_z(out.data)
753788
return out
754789

790+
def z_to_r(self):
791+
""" Convert z score back into r value for each element of data object"""
792+
793+
out = self.copy()
794+
out.data = fisher_z_to_r(out.data)
795+
return out
796+
755797
def threshold(self, upper=None, lower=None, binarize=False):
756798
"""Threshold Adjacency instance. Provide upper and lower values or
757799
percentages to perform two-sided thresholding. Binarize will return
@@ -1067,7 +1109,7 @@ def isc(
10671109
exclude_self_corr=exclude_self_corr,
10681110
random_state=random_state,
10691111
)
1070-
for i in range(n_bootstraps)
1112+
for _ in range(n_bootstraps)
10711113
)
10721114

10731115
stats["p"] = _calc_pvalue(all_bootstraps - stats["isc"], stats["isc"], tail)
@@ -1185,57 +1227,85 @@ def plot_mds(
11851227
ax.xaxis.set_visible(False)
11861228
ax.yaxis.set_visible(False)
11871229

1188-
def distance_to_similarity(self, beta=1):
1189-
"""Convert distance matrix to similarity matrix
1230+
def distance_to_similarity(self, metric="correlation", beta=1):
1231+
"""Convert distance matrix to similarity matrix.
1232+
1233+
Note: currently only implemented for correlation and euclidean.
11901234
11911235
Args:
1192-
beta: (float) parameter to scale exponential function (default: 1)
1236+
metric: (str) Can only be correlation or euclidean
1237+
beta: (float) parameter to scale exponential function (default: 1) for euclidean
11931238
11941239
Returns:
11951240
out: (Adjacency) Adjacency object
11961241
11971242
"""
11981243
if self.matrix_type == "distance":
1199-
return Adjacency(
1200-
np.exp(-beta * self.squareform() / self.squareform().std()),
1201-
labels=self.labels,
1202-
matrix_type="similarity",
1203-
)
1244+
if metric == "correlation":
1245+
return Adjacency(1 - self.squareform(), matrix_type="similarity")
1246+
elif metric == "euclidean":
1247+
return Adjacency(
1248+
np.exp(-beta * self.squareform() / self.squareform().std()),
1249+
labels=self.labels,
1250+
matrix_type="similarity",
1251+
)
1252+
else:
1253+
raise ValueError('metric can only be ["correlation","euclidean"]')
12041254
else:
12051255
raise ValueError("Matrix is not a distance matrix.")
12061256

1207-
def similarity_to_distance(self):
1208-
"""Convert similarity matrix to distance matrix"""
1209-
if self.matrix_type == "similarity":
1210-
return Adjacency(
1211-
1 - self.squareform(), labels=self.labels, matrix_type="distance"
1212-
)
1213-
else:
1214-
raise ValueError("Matrix is not a similarity matrix.")
1257+
def cluster_summary(self, clusters=None, metric="mean", summary="within"):
1258+
"""This function provides summaries of clusters within Adjacency matrices.
12151259
1216-
def within_cluster_mean(self, clusters=None):
1217-
"""This function calculates mean within cluster labels
1260+
It can compute mean/median of within and between cluster values. Requires a
1261+
list of cluster ids indicating the row/column of each cluster.
12181262
12191263
Args:
12201264
clusters: (list) list of cluster labels
1265+
metric: (str) method to summarize mean or median. If 'None" then return all r values
1266+
summary: (str) summarize within cluster or between clusters
1267+
12211268
Returns:
12221269
dict: (dict) within cluster means
1270+
12231271
"""
1272+
if metric not in ["mean", "median", None]:
1273+
raise ValueError("metric must be ['mean','median', None]")
12241274

12251275
distance = pd.DataFrame(self.squareform())
12261276
clusters = np.array(clusters)
12271277

12281278
if len(clusters) != distance.shape[0]:
12291279
raise ValueError("Cluster labels must be same length as distance matrix")
12301280

1231-
out = pd.DataFrame(columns=["Mean", "Label"], index=None)
12321281
out = {}
12331282
for i in list(set(clusters)):
1234-
out[i] = np.mean(
1235-
distance.loc[clusters == i, clusters == i].values[
1236-
np.triu_indices(sum(clusters == i), k=1)
1237-
]
1238-
)
1283+
if summary == "within":
1284+
if metric == "mean":
1285+
out[i] = np.mean(
1286+
distance.loc[clusters == i, clusters == i].values[
1287+
np.triu_indices(sum(clusters == i), k=1)
1288+
]
1289+
)
1290+
elif metric == "median":
1291+
out[i] = np.median(
1292+
distance.loc[clusters == i, clusters == i].values[
1293+
np.triu_indices(sum(clusters == i), k=1)
1294+
]
1295+
)
1296+
elif metric is None:
1297+
out[i] = distance.loc[clusters == i, clusters == i].values[
1298+
np.triu_indices(sum(clusters == i), k=1)
1299+
]
1300+
elif summary == "between":
1301+
if metric == "mean":
1302+
out[i] = distance.loc[clusters == i, clusters != i].mean().mean()
1303+
elif metric == "median":
1304+
out[i] = (
1305+
distance.loc[clusters == i, clusters != i].median().median()
1306+
)
1307+
elif metric is None:
1308+
out[i] = distance.loc[clusters == i, clusters != i]
12391309
return out
12401310

12411311
def regress(self, X, mode="ols", **kwargs):
@@ -1281,11 +1351,11 @@ def regress(self, X, mode="ols", **kwargs):
12811351
def social_relations_model(self, summarize_results=True, nan_replace=True):
12821352
"""Estimate the social relations model from a matrix for a round-robin design.
12831353
1284-
X_{ij} = m + \alpha_i + \beta_j + g_{ij} + \episolon_{ijl}
1354+
X_{ij} = m + \alpha_i + \beta_j + g_{ij} + \epsilon_{ijl}
12851355
12861356
where X_{ij} is the score for person i rating person j, m is the group mean,
12871357
\alpha_i is person i's actor effect, \beta_j is person j's partner effect, g_{ij}
1288-
is the relationship effect and \episolon_{ijl} is the error in measure l for actor i and partner j.
1358+
is the relationship effect and \epsilon_{ijl} is the error in measure l for actor i and partner j.
12891359
12901360
This model is primarily concerned with partioning the variance of the various effects.
12911361
@@ -1551,7 +1621,7 @@ def fix_missing(data):
15511621
return (X, coord)
15521622

15531623
if nan_replace:
1554-
data, coord = replace_missing(self)
1624+
data, _ = replace_missing(self)
15551625
else:
15561626
data = self.copy()
15571627

0 commit comments

Comments
 (0)