Cancer networks and data

This notebook will demonstrate network retrieval from the STRING database, basic analysis, TCGA data loading and visualization in Cytoscape from Python using the py4cytoscape package.

Open In Colab

by Kozo Nishida, Alexander Pico, Barry Demchak

py4cytoscape 0.0.10

Prerequisites

In addition to this package (py4cytoscape), you will need:

  • Cytoscape 3.8 or greater, which can be downloaded from https://cytoscape.org/download.html. Simply follow the installation instructions on screen.

  • Complete installation wizard

  • Launch Cytoscape

  • If your Cytoscape is 3.8.2 or earlier, install FileTransfer App (Follow here to do it.)

NOTE: To run this notebook, you must manually start Cytoscape first – don’t proceed until you have started Cytoscape.

Setup required only in a remote notebook environment

If you’re using a remote Jupyter Notebook environment such as Google Colab, run the cell below. (If you’re running a local Jupyter Notebook server on the desktop machine same with Cytoscape, you don’t need to do that.)

[ ]:
_PY4CYTOSCAPE = 'git+https://github.com/cytoscape/py4cytoscape@0.0.10'
import requests
exec(requests.get("https://raw.githubusercontent.com/cytoscape/jupyter-bridge/master/client/p4c_init.py").text)
IPython.display.Javascript(_PY4CYTOSCAPE_BROWSER_CLIENT_JS) # Start browser client

Note that to use the current py4cytoscape release (instead of v0.0.10), remove the _PY4CYTOSCAPE= line in the snippet above.

Sanity test to verify Cytoscape connection

By now, the connection to Cytoscape should be up and available. To verify this, try a simple operation that doesn’t alter the state of Cytoscape, but verifies that you have everything installed.

[ ]:
import py4cytoscape as p4c
[2]:
p4c.cytoscape_ping()
You are connected to Cytoscape!
[2]:
'You are connected to Cytoscape!'
[ ]:
p4c.install_app('STRINGapp')

Getting Disease Networks

Use Cytoscape to query the STRING database for networks of genes associated with breast cancer and ovarian cancer.

If the STRING app is not installed, no error is reported, but your network will be empty

Query STRING database by disease to generate networks

Breast cancer

[3]:
string_cmd = 'string disease query disease="breast cancer" cutoff=0.9 species="Homo sapiens" limit=150'
p4c.commands_run(string_cmd)
[3]:
["Loaded network 'STRING network - breast cancer' with 150 nodes and 877 edges"]
[4]:
p4c.notebook_show_image()
[4]:
../_images/tutorials_Cancer-networks-and-data_8_0.png

Here we are using Cytoscape’s command line syntax, which can be used for any core or app automation function, and then making a GET request. Use p4c.commands_help to interrogate the functions and parameters available in your active Cytoscape session, including the apps you’ve installed!

[5]:
p4c.commands_help('string')
[5]:
['add nodes',
 'change confidence',
 'change type',
 'compound query',
 'disease query',
 'enable glass',
 'expand',
 'filter enrichment',
 'hide charts',
 'list species',
 'make string',
 'protein query',
 'pubmed query',
 'retrieve enrichment',
 'retrieve publications',
 'settings',
 'show charts',
 'show enrichment',
 'show images',
 'show labels',
 'show publications',
 'stringify',
 'version']
[6]:
p4c.commands_help('string disease query')
[6]:
['cutoff', 'disease', 'limit', 'networkType', 'species', 'taxonID']

Ovarian cancer

[7]:
string_cmd = 'string disease query disease="ovarian cancer" cutoff=0.9 species="Homo sapiens" limit=150'
p4c.commands_run(string_cmd)
[7]:
["Loaded network 'STRING network - ovarian cancer' with 150 nodes and 754 edges"]
[8]:
p4c.notebook_show_image()
[8]:
../_images/tutorials_Cancer-networks-and-data_14_0.png

Interacting with Cytoscape

Now that we’ve got a couple networks into Cytoscape, let’s see what we can do with them from Python…

Get list of networks

[9]:
p4c.get_network_list()
[9]:
['STRING network - breast cancer', 'STRING network - ovarian cancer']

Layout network

[10]:
p4c.layout_network(layout_name='circular')
[10]:
{}
[11]:
p4c.notebook_show_image()
[11]:
../_images/tutorials_Cancer-networks-and-data_19_0.png

List of layout algorithms available

[12]:
p4c.get_layout_names()
[12]:
['attribute-circle',
 'fruchterman-rheingold-3D',
 'stacked-node-layout',
 'spherical',
 'degree-circle',
 'circular',
 'box',
 'attributes-layout',
 'kamada-kawai',
 'force-directed',
 'grid3D',
 'cose',
 'flatten',
 'grid',
 'hierarchical',
 'center3d',
 'fruchterman-rheingold',
 'isom',
 'force-directed-cl']

Layout with parameters!

[13]:
p4c.get_layout_property_names(layout_name='force-directed')
[13]:
['numIterations',
 'defaultSpringCoefficient',
 'defaultSpringLength',
 'defaultNodeMass',
 'isDeterministic',
 'singlePartition']
[14]:
p4c.layout_network('force-directed defaultSpringCoefficient=0.0000008 defaultSpringLength=70')
[14]:
{}
[15]:
p4c.notebook_show_image()
[15]:
../_images/tutorials_Cancer-networks-and-data_25_0.png

Get table data from network

Now, let’s look at the tablular data associated with our STRING networks…

[16]:
p4c.get_table_column_names('node')
[16]:
['SUID',
 'shared name',
 'name',
 'selected',
 'stringdb::canonical name',
 'display name',
 'stringdb::full name',
 'stringdb::database identifier',
 'stringdb::description',
 '@id',
 'stringdb::namespace',
 'stringdb::node type',
 'query term',
 'stringdb::sequence',
 'stringdb::species',
 'stringdb::STRING style',
 'stringdb::enhancedLabel Passthrough',
 'compartment::cytoskeleton',
 'compartment::cytosol',
 'compartment::endoplasmic reticulum',
 'compartment::endosome',
 'compartment::extracellular',
 'compartment::golgi apparatus',
 'compartment::lysosome',
 'compartment::mitochondrion',
 'compartment::nucleus',
 'compartment::peroxisome',
 'compartment::plasma membrane',
 'stringdb::structures',
 'target::development level',
 'target::family',
 'tissue::adrenal gland',
 'tissue::blood',
 'tissue::bone',
 'tissue::bone marrow',
 'tissue::eye',
 'tissue::gall bladder',
 'tissue::heart',
 'tissue::intestine',
 'tissue::kidney',
 'tissue::liver',
 'tissue::lung',
 'tissue::muscle',
 'tissue::nervous system',
 'tissue::pancreas',
 'tissue::saliva',
 'tissue::skin',
 'tissue::spleen',
 'tissue::stomach',
 'tissue::thyroid gland',
 'tissue::urine',
 'stringdb::disease score']

One of the great things about the STRING database is all the node and edge attriubtes they provide. Let’s pull some of it into Python to play with…

Retrieve disease scores

We can retrieve any set of columns from Cytoscape and store them as a Python pandas.DataFrame keyed by SUID. In this case, let’s retrieve the disease score column from the node table. Those will be our two parameters:

[21]:
disease_score_table = p4c.get_table_columns('node','stringdb::disease score')
[22]:
disease_score_table
[22]:
stringdb::disease score
2304 2.645192
2305 2.802262
2306 2.556009
2307 5.0
2308 2.476258
... ...
2299 2.688657
2300 2.527034
2301 3.200604
2302 2.558843
2303 2.525171

150 rows × 1 columns

[24]:
disease_score = disease_score_table['stringdb::disease score'].astype('float')
node_suid = disease_score_table.index.values.astype(str)
[25]:
disease_score
[25]:
2304    2.645192
2305    2.802262
2306    2.556009
2307    5.000000
2308    2.476258
          ...
2299    2.688657
2300    2.527034
2301    3.200604
2302    2.558843
2303    2.525171
Name: stringdb::disease score, Length: 150, dtype: float64
[26]:
node_suid
[26]:
array(['2304', '2305', '2306', '2307', '2308', '2309', '2310', '2311',
       '2312', '2313', '2314', '2315', '2316', '2317', '2318', '2169',
       '2170', '2171', '2172', '2173', '2174', '2175', '2176', '2177',
       '2178', '2179', '2180', '2181', '2182', '2183', '2184', '2185',
       '2186', '2187', '2188', '2189', '2190', '2191', '2192', '2193',
       '2194', '2195', '2196', '2197', '2198', '2199', '2200', '2201',
       '2202', '2203', '2204', '2205', '2206', '2207', '2208', '2209',
       '2210', '2211', '2212', '2213', '2214', '2215', '2216', '2217',
       '2218', '2219', '2220', '2221', '2222', '2223', '2224', '2225',
       '2226', '2227', '2228', '2229', '2230', '2231', '2232', '2233',
       '2234', '2235', '2236', '2237', '2238', '2239', '2240', '2241',
       '2242', '2243', '2244', '2245', '2246', '2247', '2248', '2249',
       '2250', '2251', '2252', '2253', '2254', '2255', '2256', '2257',
       '2258', '2259', '2260', '2261', '2262', '2263', '2264', '2265',
       '2266', '2267', '2268', '2269', '2270', '2271', '2272', '2273',
       '2274', '2275', '2276', '2277', '2278', '2279', '2280', '2281',
       '2282', '2283', '2284', '2285', '2286', '2287', '2288', '2289',
       '2290', '2291', '2292', '2293', '2294', '2295', '2296', '2297',
       '2298', '2299', '2300', '2301', '2302', '2303'], dtype='<U21')

Plot distribution and pick threshold

Now you can use Python like you normally would explore the data.

[27]:
import matplotlib.pyplot as plt
plt.figure(figsize=(25.6,19.2))
plt.xticks(rotation=270)
plt.scatter(node_suid, disease_score)
[27]:
<matplotlib.collections.PathCollection at 0x1b0d8d6f250>
../_images/tutorials_Cancer-networks-and-data_35_1.png
[28]:
disease_score.describe()
[28]:
count    150.000000
mean       3.062757
std        0.844256
min        2.435369
25%        2.553742
50%        2.691015
75%        3.041286
max        5.000000
Name: stringdb::disease score, dtype: float64

Generate subnetworks

In order to reflect your exploration back onto the network, let’s generate subnetworks…

…from top quartile of ‘disease score’

[29]:
top_quart = disease_score.quantile(q=0.75)
[30]:
top_quart
[30]:
3.04128575
[31]:
top_nodes = disease_score[disease_score > top_quart].index.values.astype(str)
[32]:
top_nodes.tolist()
[32]:
['2307',
 '2311',
 '2313',
 '2314',
 '2316',
 '2317',
 '2182',
 '2186',
 '2187',
 '2188',
 '2189',
 '2190',
 '2194',
 '2195',
 '2196',
 '2199',
 '2200',
 '2203',
 '2208',
 '2214',
 '2215',
 '2218',
 '2221',
 '2226',
 '2228',
 '2230',
 '2231',
 '2232',
 '2234',
 '2239',
 '2240',
 '2243',
 '2261',
 '2275',
 '2277',
 '2290',
 '2292',
 '2301']
[33]:
p4c.create_subnetwork(top_nodes.tolist(), subnetwork_name='top disease quartile')
#returns a Cytoscape network SUID
[33]:
3978
[34]:
p4c.notebook_show_image()
[34]:
../_images/tutorials_Cancer-networks-and-data_43_0.png

…of connected nodes only

[35]:
p4c.create_subnetwork(edges='all',subnetwork_name='top disease quartile connected')  #handy way to exclude unconnected nodes!
[35]:
4149
[36]:
p4c.notebook_show_image()
[36]:
../_images/tutorials_Cancer-networks-and-data_46_0.png

…from first neighbors of top disease score genes, using the network connectivity together with the data to direct discovery.

[38]:
p4c.set_current_network(network="STRING network - ovarian cancer")
[38]:
{}
[39]:
max(disease_score)
[39]:
5.0
[43]:
top_nodes = disease_score[disease_score==max(disease_score)].index.values.astype(str).tolist()
[44]:
top_nodes
[44]:
['2307',
 '2313',
 '2316',
 '2186',
 '2188',
 '2189',
 '2190',
 '2195',
 '2199',
 '2200',
 '2214',
 '2215',
 '2218',
 '2221',
 '2226',
 '2228',
 '2231',
 '2234',
 '2239',
 '2243',
 '2290',
 '2292']
[46]:
p4c.select_nodes(nodes=top_nodes)
[46]:
{'nodes': [2226,
  2290,
  2195,
  2292,
  2228,
  2231,
  2199,
  2200,
  2234,
  2239,
  2307,
  2243,
  2214,
  2215,
  2313,
  2218,
  2186,
  2188,
  2316,
  2189,
  2221,
  2190],
 'edges': []}
[47]:
p4c.select_first_neighbors()
[47]:
{'nodes': [2290,
  2291,
  2292,
  2293,
  2297,
  2298,
  2300,
  2301,
  2303,
  2304,
  2306,
  2307,
  2308,
  2309,
  2310,
  2313,
  2314,
  2316,
  2317,
  2318,
  2169,
  2172,
  2173,
  2174,
  2176,
  2179,
  2180,
  2183,
  2184,
  2185,
  2186,
  2187,
  2188,
  2189,
  2190,
  2191,
  2194,
  2195,
  2196,
  2197,
  2198,
  2199,
  2200,
  2201,
  2202,
  2203,
  2204,
  2207,
  2210,
  2212,
  2214,
  2215,
  2216,
  2217,
  2218,
  2219,
  2221,
  2222,
  2223,
  2224,
  2226,
  2227,
  2228,
  2230,
  2231,
  2232,
  2233,
  2234,
  2235,
  2237,
  2238,
  2239,
  2240,
  2241,
  2242,
  2243,
  2244,
  2245,
  2255,
  2256,
  2259,
  2260,
  2261,
  2262,
  2263,
  2264,
  2266,
  2268,
  2269,
  2271,
  2272,
  2274,
  2275,
  2280,
  2281,
  2283,
  2285],
 'edges': []}
[48]:
p4c.create_subnetwork('selected', subnetwork_name='top disease neighbors') # selected nodes, all connecting edges (default)
[48]:
4315
[49]:
p4c.notebook_show_image()
[49]:
../_images/tutorials_Cancer-networks-and-data_55_0.png

…from diffusion algorithm starting with top disease score genes, using the network connectivity in a more subtle way than just first-degree neighbors.

[51]:
p4c.set_current_network(network="STRING network - ovarian cancer")
[51]:
{}
[52]:
p4c.select_nodes(nodes=top_nodes)
[52]:
{'nodes': [2226,
  2290,
  2195,
  2292,
  2228,
  2231,
  2199,
  2200,
  2234,
  2239,
  2307,
  2243,
  2214,
  2215,
  2313,
  2218,
  2186,
  2188,
  2316,
  2189,
  2221,
  2190],
 'edges': []}
[53]:
p4c.commands_post('diffusion diffuse') # diffusion!
[53]:
{'heatColumn': 'diffusion_output_heat', 'rankColumn': 'diffusion_output_rank'}
[54]:
p4c.create_subnetwork('selected', subnetwork_name='top disease diffusion')
[54]:
5036
[55]:
p4c.notebook_show_image()
[55]:
../_images/tutorials_Cancer-networks-and-data_61_0.png
[56]:
p4c.layout_network('force-directed')
[56]:
{}
[57]:
p4c.notebook_show_image()
[57]:
../_images/tutorials_Cancer-networks-and-data_63_0.png

Pro-tip: don’t forget to p4c.set_current_network() to the correct parent network before getting table column data and making selections.

[ ]: