Cancer networks and data

This notebook will demonstrate network retrieval from the STRING database, basic analysis, TCGA data loading and visualization in Cytoscape from Python using the py4cytoscape package.

Open In Colab

by Kozo Nishida, Alexander Pico, Barry Demchak

py4cytoscape 0.0.11

Prerequisites

In addition to this package (py4cytoscape), you will need:

  • Cytoscape 3.8 or greater, which can be downloaded from https://cytoscape.org/download.html. Simply follow the installation instructions on screen.

  • Complete installation wizard

  • Launch Cytoscape

  • If your Cytoscape is 3.8.2 or earlier, install FileTransfer App (Follow here to do it.)

NOTE: To run this notebook, you must manually start Cytoscape first – don’t proceed until you have started Cytoscape.

Setup required only in a remote notebook environment

If you’re using a remote Jupyter Notebook environment such as Google Colab, run the cell below. (If you’re running a local Jupyter Notebook server on the desktop machine same with Cytoscape, you don’t need to do that.)

[ ]:
_PY4CYTOSCAPE = 'git+https://github.com/cytoscape/py4cytoscape@0.0.11'
import requests
exec(requests.get("https://raw.githubusercontent.com/cytoscape/jupyter-bridge/master/client/p4c_init.py").text)
IPython.display.Javascript(_PY4CYTOSCAPE_BROWSER_CLIENT_JS) # Start browser client

Note that to use the current py4cytoscape release (instead of v0.0.11), remove the _PY4CYTOSCAPE= line in the snippet above.

Sanity test to verify Cytoscape connection

By now, the connection to Cytoscape should be up and available. To verify this, try a simple operation that doesn’t alter the state of Cytoscape, but verifies that you have everything installed.

[1]:
import py4cytoscape as p4c
[2]:
p4c.cytoscape_ping()
You are connected to Cytoscape!
[2]:
'You are connected to Cytoscape!'
[3]:
p4c.install_app('STRINGapp')
In commands_post(): java.lang.NullPointerException
---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
~\anaconda3\lib\site-packages\py4cytoscape\commands.py in commands_post(cmd, base_url)
    389         r = _do_request('POST', post_url, json=post_body, headers=headers, base_url=base_url)
--> 390         r.raise_for_status()
    391         res = json.loads(r.text)

~\anaconda3\lib\site-packages\requests\models.py in raise_for_status(self)
    942         if http_error_msg:
--> 943             raise HTTPError(http_error_msg, response=self)
    944

HTTPError: 500 Server Error: Internal Server Error for url: http://127.0.0.1:1234/v1/commands/apps/install

During handling of the above exception, another exception occurred:

CyError                                   Traceback (most recent call last)
<ipython-input-3-677f1db4cf8d> in <module>
----> 1 p4c.install_app('STRINGapp')

~\anaconda3\lib\site-packages\py4cytoscape\py4cytoscape_logger.py in wrapper_log(*args, **kwargs)
    131             return log_return(func, value)
    132         except Exception as e:
--> 133             log_exception(func, e)
    134         finally:
    135             log_finally()

~\anaconda3\lib\site-packages\py4cytoscape\py4cytoscape_logger.py in wrapper_log(*args, **kwargs)
    128         log_incoming(func, *args, **kwargs)
    129         try:
--> 130             value = func(*args, **kwargs) # Call function being logged
    131             return log_return(func, value)
    132         except Exception as e:

~\anaconda3\lib\site-packages\py4cytoscape\apps.py in install_app(app, base_url)
    133     """
    134     verify_supported_versions(1, 3.7, base_url=base_url)
--> 135     res = commands.commands_post(f'apps install app="{app}"', base_url=base_url)
    136     return narrate(res)
    137

~\anaconda3\lib\site-packages\py4cytoscape\py4cytoscape_logger.py in wrapper_log(*args, **kwargs)
    131             return log_return(func, value)
    132         except Exception as e:
--> 133             log_exception(func, e)
    134         finally:
    135             log_finally()

~\anaconda3\lib\site-packages\py4cytoscape\py4cytoscape_logger.py in wrapper_log(*args, **kwargs)
    128         log_incoming(func, *args, **kwargs)
    129         try:
--> 130             value = func(*args, **kwargs) # Call function being logged
    131             return log_return(func, value)
    132         except Exception as e:

~\anaconda3\lib\site-packages\py4cytoscape\commands.py in commands_post(cmd, base_url)
    394         return res['data']
    395     except requests.exceptions.RequestException as e:
--> 396         _handle_error(e)
    397
    398

~\anaconda3\lib\site-packages\py4cytoscape\commands.py in _handle_error(e, force_cy_error)
    680             else:
    681                 show_error(f'In {caller}: {e}\n{content}')
--> 682         raise e
    683
    684

CyError: In commands_post(): java.lang.NullPointerException

Getting Disease Networks

Use Cytoscape to query the STRING database for networks of genes associated with breast cancer and ovarian cancer.

If the STRING app is not installed, no error is reported, but your network will be empty

Query STRING database by disease to generate networks

Breast cancer

[ ]:
string_cmd = 'string disease query disease="breast cancer" cutoff=0.9 species="Homo sapiens" limit=150'
p4c.commands_run(string_cmd)
[ ]:
p4c.notebook_export_show_image()

Here we are using Cytoscape’s command line syntax, which can be used for any core or app automation function, and then making a GET request. Use p4c.commands_help to interrogate the functions and parameters available in your active Cytoscape session, including the apps you’ve installed!

[ ]:
p4c.commands_help('string')
[ ]:
p4c.commands_help('string disease query')

Ovarian cancer

[ ]:
string_cmd = 'string disease query disease="ovarian cancer" cutoff=0.9 species="Homo sapiens" limit=150'
p4c.commands_run(string_cmd)
[ ]:
p4c.notebook_export_show_image()

Interacting with Cytoscape

Now that we’ve got a couple networks into Cytoscape, let’s see what we can do with them from Python…

Get list of networks

[ ]:
p4c.get_network_list()

Layout network

[ ]:
p4c.layout_network(layout_name='circular')
[ ]:
p4c.notebook_export_show_image()

List of layout algorithms available

[ ]:
p4c.get_layout_names()

Layout with parameters!

[ ]:
p4c.get_layout_property_names(layout_name='force-directed')
[ ]:
p4c.layout_network('force-directed defaultSpringCoefficient=0.0000008 defaultSpringLength=70')
[ ]:
p4c.notebook_export_show_image()

Get table data from network

Now, let’s look at the tablular data associated with our STRING networks…

[ ]:
p4c.get_table_column_names('node')

One of the great things about the STRING database is all the node and edge attriubtes they provide. Let’s pull some of it into Python to play with…

Retrieve disease scores

We can retrieve any set of columns from Cytoscape and store them as a Python pandas.DataFrame keyed by SUID. In this case, let’s retrieve the disease score column from the node table. Those will be our two parameters:

[ ]:
disease_score_table = p4c.get_table_columns('node','stringdb::disease score')
[ ]:
disease_score_table
[ ]:
disease_score = disease_score_table['stringdb::disease score'].astype('float')
node_suid = disease_score_table.index.values.astype(str)
[ ]:
disease_score
[ ]:
node_suid

Plot distribution and pick threshold

Now you can use Python like you normally would explore the data.

[ ]:
import matplotlib.pyplot as plt
plt.figure(figsize=(25.6,19.2))
plt.xticks(rotation=270)
plt.scatter(node_suid, disease_score)
[ ]:
disease_score.describe()

Generate subnetworks

In order to reflect your exploration back onto the network, let’s generate subnetworks…

…from top quartile of ‘disease score’

[ ]:
top_quart = disease_score.quantile(q=0.75)
[ ]:
top_quart
[ ]:
top_nodes = disease_score[disease_score > top_quart].index.values.astype(str)
[ ]:
top_nodes.tolist()
[ ]:
p4c.create_subnetwork(top_nodes.tolist(), subnetwork_name='top disease quartile')
#returns a Cytoscape network SUID
[ ]:
p4c.notebook_export_show_image()

…of connected nodes only

[ ]:
p4c.create_subnetwork(edges='all',subnetwork_name='top disease quartile connected')  #handy way to exclude unconnected nodes!
[ ]:
p4c.notebook_export_show_image()

…from first neighbors of top disease score genes, using the network connectivity together with the data to direct discovery.

[ ]:
p4c.set_current_network(network="STRING network - ovarian cancer")
[ ]:
max(disease_score)
[ ]:
top_nodes = disease_score[disease_score==max(disease_score)].index.values.astype(str).tolist()
[ ]:
top_nodes
[ ]:
p4c.select_nodes(nodes=top_nodes)
[ ]:
p4c.select_first_neighbors()
[ ]:
p4c.create_subnetwork('selected', subnetwork_name='top disease neighbors') # selected nodes, all connecting edges (default)
[ ]:
p4c.notebook_export_show_image()

…from diffusion algorithm starting with top disease score genes, using the network connectivity in a more subtle way than just first-degree neighbors.

[ ]:
p4c.set_current_network(network="STRING network - ovarian cancer")
[ ]:
p4c.select_nodes(nodes=top_nodes)
[ ]:
p4c.commands_post('diffusion diffuse') # diffusion!
[ ]:
p4c.create_subnetwork('selected', subnetwork_name='top disease diffusion')
[ ]:
p4c.notebook_export_show_image()
[ ]:
p4c.layout_network('force-directed')
[ ]:
p4c.notebook_export_show_image()

Pro-tip: don’t forget to p4c.set_current_network() to the correct parent network before getting table column data and making selections.

[ ]: