Cancer networks and data¶
This notebook will demonstrate network retrieval from the STRING database, basic analysis, TCGA data loading and visualization in Cytoscape from Python using the py4cytoscape package.
by Kozo Nishida, Alexander Pico, Barry Demchak
py4cytoscape 0.0.11
Prerequisites¶
In addition to this package (py4cytoscape), you will need:
Cytoscape 3.8 or greater, which can be downloaded from https://cytoscape.org/download.html. Simply follow the installation instructions on screen.
Complete installation wizard
Launch Cytoscape
If your Cytoscape is 3.8.2 or earlier, install FileTransfer App (Follow here to do it.)
NOTE: To run this notebook, you must manually start Cytoscape first – don’t proceed until you have started Cytoscape.
Setup required only in a remote notebook environment¶
If you’re using a remote Jupyter Notebook environment such as Google Colab, run the cell below. (If you’re running a local Jupyter Notebook server on the desktop machine same with Cytoscape, you don’t need to do that.)
[ ]:
_PY4CYTOSCAPE = 'git+https://github.com/cytoscape/py4cytoscape@0.0.11'
import requests
exec(requests.get("https://raw.githubusercontent.com/cytoscape/jupyter-bridge/master/client/p4c_init.py").text)
IPython.display.Javascript(_PY4CYTOSCAPE_BROWSER_CLIENT_JS) # Start browser client
Note that to use the current py4cytoscape release (instead of v0.0.11), remove the _PY4CYTOSCAPE= line in the snippet above.
Sanity test to verify Cytoscape connection¶
By now, the connection to Cytoscape should be up and available. To verify this, try a simple operation that doesn’t alter the state of Cytoscape, but verifies that you have everything installed.
[1]:
import py4cytoscape as p4c
[2]:
p4c.cytoscape_ping()
You are connected to Cytoscape!
[2]:
'You are connected to Cytoscape!'
[3]:
p4c.install_app('STRINGapp')
In commands_post(): java.lang.NullPointerException
---------------------------------------------------------------------------
HTTPError Traceback (most recent call last)
~\anaconda3\lib\site-packages\py4cytoscape\commands.py in commands_post(cmd, base_url)
389 r = _do_request('POST', post_url, json=post_body, headers=headers, base_url=base_url)
--> 390 r.raise_for_status()
391 res = json.loads(r.text)
~\anaconda3\lib\site-packages\requests\models.py in raise_for_status(self)
942 if http_error_msg:
--> 943 raise HTTPError(http_error_msg, response=self)
944
HTTPError: 500 Server Error: Internal Server Error for url: http://127.0.0.1:1234/v1/commands/apps/install
During handling of the above exception, another exception occurred:
CyError Traceback (most recent call last)
<ipython-input-3-677f1db4cf8d> in <module>
----> 1 p4c.install_app('STRINGapp')
~\anaconda3\lib\site-packages\py4cytoscape\py4cytoscape_logger.py in wrapper_log(*args, **kwargs)
131 return log_return(func, value)
132 except Exception as e:
--> 133 log_exception(func, e)
134 finally:
135 log_finally()
~\anaconda3\lib\site-packages\py4cytoscape\py4cytoscape_logger.py in wrapper_log(*args, **kwargs)
128 log_incoming(func, *args, **kwargs)
129 try:
--> 130 value = func(*args, **kwargs) # Call function being logged
131 return log_return(func, value)
132 except Exception as e:
~\anaconda3\lib\site-packages\py4cytoscape\apps.py in install_app(app, base_url)
133 """
134 verify_supported_versions(1, 3.7, base_url=base_url)
--> 135 res = commands.commands_post(f'apps install app="{app}"', base_url=base_url)
136 return narrate(res)
137
~\anaconda3\lib\site-packages\py4cytoscape\py4cytoscape_logger.py in wrapper_log(*args, **kwargs)
131 return log_return(func, value)
132 except Exception as e:
--> 133 log_exception(func, e)
134 finally:
135 log_finally()
~\anaconda3\lib\site-packages\py4cytoscape\py4cytoscape_logger.py in wrapper_log(*args, **kwargs)
128 log_incoming(func, *args, **kwargs)
129 try:
--> 130 value = func(*args, **kwargs) # Call function being logged
131 return log_return(func, value)
132 except Exception as e:
~\anaconda3\lib\site-packages\py4cytoscape\commands.py in commands_post(cmd, base_url)
394 return res['data']
395 except requests.exceptions.RequestException as e:
--> 396 _handle_error(e)
397
398
~\anaconda3\lib\site-packages\py4cytoscape\commands.py in _handle_error(e, force_cy_error)
680 else:
681 show_error(f'In {caller}: {e}\n{content}')
--> 682 raise e
683
684
CyError: In commands_post(): java.lang.NullPointerException
Getting Disease Networks¶
Use Cytoscape to query the STRING database for networks of genes associated with breast cancer and ovarian cancer.
If the STRING app is not installed, no error is reported, but your network will be empty
Query STRING database by disease to generate networks¶
Breast cancer¶
[ ]:
string_cmd = 'string disease query disease="breast cancer" cutoff=0.9 species="Homo sapiens" limit=150'
p4c.commands_run(string_cmd)
[ ]:
p4c.notebook_export_show_image()
Here we are using Cytoscape’s command line syntax, which can be used for any core or app automation function, and then making a GET request. Use p4c.commands_help to interrogate the functions and parameters available in your active Cytoscape session, including the apps you’ve installed!
[ ]:
p4c.commands_help('string')
[ ]:
p4c.commands_help('string disease query')
Ovarian cancer¶
[ ]:
string_cmd = 'string disease query disease="ovarian cancer" cutoff=0.9 species="Homo sapiens" limit=150'
p4c.commands_run(string_cmd)
[ ]:
p4c.notebook_export_show_image()
Interacting with Cytoscape¶
Now that we’ve got a couple networks into Cytoscape, let’s see what we can do with them from Python…
Get list of networks¶
[ ]:
p4c.get_network_list()
Layout network¶
[ ]:
p4c.layout_network(layout_name='circular')
[ ]:
p4c.notebook_export_show_image()
List of layout algorithms available¶
[ ]:
p4c.get_layout_names()
Layout with parameters!¶
[ ]:
p4c.get_layout_property_names(layout_name='force-directed')
[ ]:
p4c.layout_network('force-directed defaultSpringCoefficient=0.0000008 defaultSpringLength=70')
[ ]:
p4c.notebook_export_show_image()
Get table data from network¶
Now, let’s look at the tablular data associated with our STRING networks…
[ ]:
p4c.get_table_column_names('node')
One of the great things about the STRING database is all the node and edge attriubtes they provide. Let’s pull some of it into Python to play with…
Retrieve disease scores¶
We can retrieve any set of columns from Cytoscape and store them as a Python pandas.DataFrame keyed by SUID. In this case, let’s retrieve the disease score column from the node table. Those will be our two parameters:
[ ]:
disease_score_table = p4c.get_table_columns('node','stringdb::disease score')
[ ]:
disease_score_table
[ ]:
disease_score = disease_score_table['stringdb::disease score'].astype('float')
node_suid = disease_score_table.index.values.astype(str)
[ ]:
disease_score
[ ]:
node_suid
Plot distribution and pick threshold¶
Now you can use Python like you normally would explore the data.
[ ]:
import matplotlib.pyplot as plt
plt.figure(figsize=(25.6,19.2))
plt.xticks(rotation=270)
plt.scatter(node_suid, disease_score)
[ ]:
disease_score.describe()
Generate subnetworks¶
In order to reflect your exploration back onto the network, let’s generate subnetworks…
…from top quartile of ‘disease score’
[ ]:
top_quart = disease_score.quantile(q=0.75)
[ ]:
top_quart
[ ]:
top_nodes = disease_score[disease_score > top_quart].index.values.astype(str)
[ ]:
top_nodes.tolist()
[ ]:
p4c.create_subnetwork(top_nodes.tolist(), subnetwork_name='top disease quartile')
#returns a Cytoscape network SUID
[ ]:
p4c.notebook_export_show_image()
…of connected nodes only
[ ]:
p4c.create_subnetwork(edges='all',subnetwork_name='top disease quartile connected') #handy way to exclude unconnected nodes!
[ ]:
p4c.notebook_export_show_image()
…from first neighbors of top disease score genes, using the network connectivity together with the data to direct discovery.
[ ]:
p4c.set_current_network(network="STRING network - ovarian cancer")
[ ]:
max(disease_score)
[ ]:
top_nodes = disease_score[disease_score==max(disease_score)].index.values.astype(str).tolist()
[ ]:
top_nodes
[ ]:
p4c.select_nodes(nodes=top_nodes)
[ ]:
p4c.select_first_neighbors()
[ ]:
p4c.create_subnetwork('selected', subnetwork_name='top disease neighbors') # selected nodes, all connecting edges (default)
[ ]:
p4c.notebook_export_show_image()
…from diffusion algorithm starting with top disease score genes, using the network connectivity in a more subtle way than just first-degree neighbors.
[ ]:
p4c.set_current_network(network="STRING network - ovarian cancer")
[ ]:
p4c.select_nodes(nodes=top_nodes)
[ ]:
p4c.commands_post('diffusion diffuse') # diffusion!
[ ]:
p4c.create_subnetwork('selected', subnetwork_name='top disease diffusion')
[ ]:
p4c.notebook_export_show_image()
[ ]:
p4c.layout_network('force-directed')
[ ]:
p4c.notebook_export_show_image()
Pro-tip: don’t forget to p4c.set_current_network() to the correct parent network before getting table column data and making selections.
[ ]: