[1]:
%config InlineBackend.figure_formats = ['png']
%matplotlib inline
Matplotlib is building the font cache; this may take a moment.
[2]:
import warnings

warnings.filterwarnings('ignore')
import os

Overview of Functionality

image0

Examples for how to use functions from bioplexpy

use cell magic to move up to parent directory

[3]:
cd ..
/home/docs/checkouts/readthedocs.org/user_builds/bioplexpy/checkouts/v1.1.1

Import functions from modules

[4]:
# data import funcs
# analysis funcs
from bioplexpy.analysis_funcs import (
    PDB_chains_to_uniprot,
    PDB_to_interacting_chains_uniprot_maps,
    bioplex2graph,
    get_DataFrame_from_PPI_network,
    get_interacting_chains_from_PDB,
    get_PPI_network_for_complex,
    get_prop_edges_in_complex_identified,
    list_uniprot_pdb_mappings,
    resampling_test_for_uniprot_list,
)
from bioplexpy.data_import_funcs import (
    get_PDB_from_UniProts,
    get_UniProts_from_CORUM,
    getBioPlex,
    getCorum,
    getGSE122425,
)

# visualization funcs
from bioplexpy.visualization_funcs import (
    display_PDB_network_for_complex,
    display_PPI_network_for_complex,
    display_PPI_network_match_PDB,
)

[1] getBioPlex - function to retrieve interactions data

Description

Load BioPlex interactions data - This function loads BioPlex PPI data for cell lines HEK293T and HCT116, note we only have version 1.0 for HCT116 cells.

Parameters

  1. cell_line : str

  • Takes input: ‘293T’ or ‘HCT116’

  1. version : str

  • Takes input: ‘3.0’, ‘1.0’ or ‘2.0’

Returns

Pandas DataFrame

  • A dataframe with each row corresponding to a PPI interaction.

Column Descriptions

GeneA: Entrez Gene ID for the first interacting protein

GeneB: Entrez Gene ID for the second interacting protein

UniprotA: Uniprot ID for the first interacting protein

UniprotB: Uniprot ID for the second interacting protein

SymbolA: Symbol for the first interacting protein

SymbolB: Symbol for the second interacting protein

p(Wrong ID): Probability of wrong protein ID (CompPASS-Plus)

p(NotInteractor): Probability of nonspecific background (CompPASS-Plus)

p(Interactor): Probability of high-confidence interaction (CompPASS-Plus)

Example 1

Load DataFrames that contain protein-protein interactions from HEK293T & HCT116 cell lines

Huttlin, E. L., Bruckner, R. J., Navarrete-Perea, J., Cannon, J. R., Baltier, K., Gebreab, F., … & Gygi, S. P. (2021). Dual proteome-scale networks reveal cell-specific remodeling of the human interactome. Cell, 184(11), 3022-3040.

[5]:
bp_293t = getBioPlex('293T', '3.0')
[6]:
bp_293t.head()
[6]:
GeneA GeneB UniprotA UniprotB SymbolA SymbolB pW pNI pInt
0 100 728378 P00813 A5A3E0 ADA POTEF 6.881844e-10 0.000118 0.999882
1 222389 6137 Q8N7W2-2 P26373 BEND7 RPL13 1.340380e-18 0.225664 0.774336
2 222389 5928 Q8N7W2-2 Q09028-3 BEND7 RBBP4 7.221401e-21 0.000064 0.999936
3 222389 25873 Q8N7W2-2 Q9Y3U8 BEND7 RPL36 7.058372e-17 0.128183 0.871817
4 222389 6124 Q8N7W2-2 P36578 BEND7 RPL4 1.632313e-22 0.200638 0.799362

Example 2

[7]:
bp_hct116 = getBioPlex('HCT116', '1.0')
[8]:
bp_hct116.head()
[8]:
GeneA GeneB UniprotA UniprotB SymbolA SymbolB pW pNI pInt
0 88455 50649 Q8IZ07 Q9NR80-4 ANKRD13A ARHGEF4 3.959215e-04 0.000033 0.999571
1 88455 115106 Q8IZ07 Q96CS2 ANKRD13A HAUS1 4.488473e-02 0.001935 0.953181
2 88455 23086 Q8IZ07 Q8NEV8-2 ANKRD13A EXPH5 7.402394e-05 0.000930 0.998996
3 88455 54930 Q8IZ07 Q9H6D7 ANKRD13A HAUS4 9.180959e-07 0.000128 0.999871
4 88455 79441 Q8IZ07 Q68CZ6 ANKRD13A HAUS3 8.709394e-07 0.001495 0.998504

[2] getGSE122425 - function to retrieve HEK293 RNAseq expression data

Description

Retrieve HEK293 RNAseq expression data.

Returns

adata : AnnData object

  • SummarizedExperiment of HEK293 raw count with an added layer storing rpkm.

Example 1

Load AnnData structure that has mRNA profiles (RNA-seq) of wild type (WT) and NSUN2-/- HEK293 cells that were generated by deep sequencing, in triplicate.

Effects of NSUN2 deficiency on the mRNA 5-methylcytosine modification and gene expression profile in HEK293 cells (RNA-Seq)

[9]:
HEK293_adata = getGSE122425()
[10]:
HEK293_adata
[10]:
AnnData object with n_obs × n_vars = 57905 × 6
    obs: 'SYMBOL', 'KO', 'GO', 'length'
    layers: 'rpkm'

List rows (observations) that specify genes

[11]:
print(HEK293_adata.obs_names[:10].tolist())
['ENSG00000223972', 'ENSG00000227232', 'ENSG00000243485', 'ENSG00000237613', 'ENSG00000268020', 'ENSG00000240361', 'ENSG00000186092', 'ENSG00000238009', 'ENSG00000239945', 'ENSG00000233750']

List columns (variables) that specify whether at knock-out or wildtype replicates

[12]:
print(HEK293_adata.var_names.tolist())
['NK.1', 'NK.2', 'NK.3', 'WT.1', 'WT.2', 'WT.3']

Call matrix with raw counts

[13]:
print(HEK293_adata.X)
[[   0    0    2    1    2    2]
 [ 705  812 1121  732  690  804]
 [   0    0    0    0    0    2]
 ...
 [   0    0    0    0    0    0]
 [   0    0    0    0    0    0]
 [   0    0    0    0    0    0]]

Call matrix with rpkm

[14]:
print(HEK293_adata.layers["rpkm"])
[[0.   0.   0.01 0.01 0.01 0.01]
 [4.77 5.21 6.8  5.43 5.07 5.39]
 [0.   0.   0.   0.   0.   0.04]
 ...
 [0.   0.   0.   0.   0.   0.  ]
 [0.   0.   0.   0.   0.   0.  ]
 [0.   0.   0.   0.   0.   0.  ]]

[3] bioplex2graph - function to convert BioPlex PPIs into a graph data structure

Description

Convert BioPlex PPIs into a graph - This function converts representation of BioPlex PPIs into a graph data structure representation of BioPlex PPIs in a NetworkX object from NetworkX.

Parameters

  1. DataFrame of PPIs : Pandas DataFrame

Returns

NetworkX graph

  • A NetworkX graph with Nodes = Uniprot Gene Symbols and Edges = interactions.

Example 1

  1. Obtain the latest version of the 293T PPI network

[15]:
bp_293t_df = getBioPlex('293T', '3.0')
[16]:
bp_293t_df.head()
[16]:
GeneA GeneB UniprotA UniprotB SymbolA SymbolB pW pNI pInt
0 100 728378 P00813 A5A3E0 ADA POTEF 6.881844e-10 0.000118 0.999882
1 222389 6137 Q8N7W2-2 P26373 BEND7 RPL13 1.340380e-18 0.225664 0.774336
2 222389 5928 Q8N7W2-2 Q09028-3 BEND7 RBBP4 7.221401e-21 0.000064 0.999936
3 222389 25873 Q8N7W2-2 Q9Y3U8 BEND7 RPL36 7.058372e-17 0.128183 0.871817
4 222389 6124 Q8N7W2-2 P36578 BEND7 RPL4 1.632313e-22 0.200638 0.799362
  1. Turn the data into a graph with NetworkX

[17]:
bp_293t_G = bioplex2graph(bp_293t_df)

Example 2 - Examine properties of Network

Analyze nodes in PPI network

[18]:
len(bp_293t_G.nodes()) # number of nodes
[18]:
13689
[19]:
list(bp_293t_G.nodes())[0:5]
[19]:
['P00813', 'A5A3E0', 'Q8N7W2', 'P26373', 'Q09028']

Call nodes and attributes stored for each node

[20]:
bp_293t_G.nodes['P00813']
[20]:
{'entrezid': 100, 'symbol': 'ADA', 'isoform': 'P00813', 'bait': True}
[21]:
bp_293t_G.nodes['Q8N7W2']
[21]:
{'entrezid': 222389, 'symbol': 'BEND7', 'isoform': 'Q8N7W2-2', 'bait': True}

Analyze edges in PPI network

[22]:
len(bp_293t_G.edges()) # number of edges
[22]:
115868
[23]:
list(bp_293t_G.edges())[0:5]
[23]:
[('P00813', 'A5A3E0'),
 ('Q8N7W2', 'P26373'),
 ('Q8N7W2', 'Q09028'),
 ('Q8N7W2', 'Q9Y3U8'),
 ('Q8N7W2', 'P36578')]

Call edges and attributes stored for each edge (PPI)

[24]:
bp_293t_G.get_edge_data('P00813', 'A5A3E0')
[24]:
{'pW': 6.88184379952655e-10,
 'pNI': 0.000117635665707,
 'pInt': 0.999882363646109}

Example 3 - Run network algorithms using NetworkX

[25]:
import networkx as nx
import pandas as pd

Run pagerank algorithm on network and rank nodes

[26]:
bp_293t_G_pr = nx.pagerank(bp_293t_G, alpha=0.8)
bp_293t_G_pr = pd.Series(bp_293t_G_pr)
bp_293t_G_pr.sort_values(ascending = False, inplace = True)
[27]:
bp_293t_G_pr.head(n=5)
[27]:
P11142    0.002610
P11021    0.002168
Q04917    0.001840
O14556    0.001527
P0CG47    0.001471
dtype: float64

[4] getCorum - function to retrieve CORUM complex data

Description

Functionality for retrieving the CORUM protein complex data.

Parameters

  1. complex_set : str

  • Maps to CORUM files:

    • ‘all’ -> corum_allComplexes.txt, (default ‘all’).

    • ‘drug’ -> corum_drugs.txt,

    • ‘splice’ -> corum_spliceComplexes.txt,

    • ‘partial’ -> corum_partialComplexes.txt.

  1. organism : str

  • Takes input ‘Bovine’,’Dog’,’Hamster’,’Human’,’MINK’,’Mammalia’,’Mouse’,’Pig’,’Rabbit’,’Rat’ (default ‘Human’).

Returns

Pandas DataFrame

  • A dataframe with each row corresponding to a CORUM complex.

Example 1

Retrieve all CORUM complexes for Human

[28]:
all_Human_CORUM_df = getCorum()
[29]:
all_Human_CORUM_df.head(n=3)
[29]:
complex_id complex_name synonyms organism cell_line pmid comment_complex comment_members comment_disease comment_drug ... functions_go_name functions_go_ontology fcgs_description fcgs_id fcgs_name fcgs_category_id fcgs_category_name fcgs_go_id fcgs_go_name fcgs_go_ontology
0 1 BCL6-HDAC4 complex NaN Human U2OS osteosarcoma-derived UTA-L cells 11929873 Transcriptional repression by BCL6 is thought ... NaN NaN NaN ... histone deacetylase activity;nucleus;DNA topol... molecular_function;cellular_component;biologic... NaN NaN NaN NaN NaN NaN NaN NaN
1 2 BCL6-HDAC5 complex NaN Human U2OS osteosarcoma-derived UTA-L cells 11929873 Transcriptional repression by BCL6 is thought ... NaN NaN NaN ... histone deacetylase activity;nucleus;DNA topol... molecular_function;cellular_component;biologic... NaN NaN NaN NaN NaN NaN NaN NaN
2 3 BCL6-HDAC7 complex NaN Human U2OS osteosarcoma-derived UTA-L cells 11929873 Transcriptional repression by BCL6 is thought ... NaN NaN NaN ... histone deacetylase activity;nucleus;DNA topol... molecular_function;cellular_component;biologic... NaN NaN NaN NaN NaN NaN NaN NaN

3 rows × 34 columns

[30]:
all_Human_CORUM_df.shape
[30]:
(5376, 34)

Example 2

Retrieve the core CORUM complexes for Mouse

[31]:
core_Mouse_CORUM_df = getCorum('all','Mouse')
[32]:
core_Mouse_CORUM_df.head(n=3)
[32]:
complex_id complex_name synonyms organism cell_line pmid comment_complex comment_members comment_disease comment_drug ... functions_go_name functions_go_ontology fcgs_description fcgs_id fcgs_name fcgs_category_id fcgs_category_name fcgs_go_id fcgs_go_name fcgs_go_ontology
0 9 Ahr-Arnt complex 6S-nuclear aryl hydrocarbon (Ah) receptor liga... Mouse Hepa-1cells 1317062 Arnt contains a basic helix-loop-helix motif, ... NaN NaN NaN ... DNA binding;nucleus;regulation of DNA-template... molecular_function;cellular_component;biologic... NaN NaN NaN NaN NaN NaN NaN NaN
1 14 BLOC-2 (biogenesis of lysosome-related organel... NaN Mouse liver 14718540 The results indicate that the Hps3, Hps5, and ... NaN HPS1-7 are involved in Hermansky-Pudlak syndro... NaN ... endosome organization;vacuole organization;lys... biological_process;biological_process;biologic... NaN NaN NaN NaN NaN NaN NaN NaN
2 24 BLOC-1 (biogenesis of lysosome-related organel... NaN Mouse liver 15102850 The authors identified Snapin, BLOS1, BLOS2, a... NaN PLDN, MUTED, CNO and DTNBP1 are involved in He... NaN ... endosome organization;vacuole organization;lys... biological_process;biological_process;biologic... NaN NaN NaN NaN NaN NaN NaN NaN

3 rows × 34 columns

[33]:
core_Mouse_CORUM_df.shape
[33]:
(1284, 34)

[5] get_PPI_network_for_complex - that returns matching edges (PPI data) for a given CORUM complex as a sub-graph

Description

This function returns a subgraph of PPIs identified through AP-MS between the proteins in a specified CORUM complex.

Parameters

  1. Network of PPIs : NetworkX graph

  2. DataFrame of CORUM complexes : Pandas DataFrame

  3. Corum Complex ID: int

Returns

NetworkX Graph

  • A subgraph induced by the proteins in a CORUM complex from the BioPlex network used as input.

Example 1

  1. Obtain the latest version of the 293T PPI network

[34]:
bp_293t_df = getBioPlex('293T', '3.0')
  1. Obtain NetworkX graph representation of 293T PPI network

[35]:
bp_293t_G = bioplex2graph(bp_293t_df)
  1. Obtain core CORUM complexes for Human

[36]:
Corum_DF = getCorum()
  1. Get edges detected via AP-MS for ING2 complex from HEK293T cell line PPI data version 3.0 (ING2 complex ID: 2851)

[37]:
ING2_bp_293t_G = get_PPI_network_for_complex(bp_293t_G, Corum_DF, 2851)
[38]:
len(list(ING2_bp_293t_G.edges))
[38]:
56

Example 2

Get edges detected via AP-MS for BCOR complex from HEK293T cell line PPI data version 3.0 (BCOR complex ID: 1178)

[39]:
BCOR_bp_293t_G = get_PPI_network_for_complex(bp_293t_G, Corum_DF, 1178)
[40]:
len(list(BCOR_bp_293t_G.edges))
[40]:
14

[6] get_DataFrame_from_PPI_network - that takes an AP-MS graph (NetworkX) and returns a dataframe of PPIs (Edges)

Description

This function returns a DataFrame of PPIs (identified through AP-MS) represented as a graph.

Parameters

  1. Network of PPIs : NetworkX graph

Returns

Pandas DataFrame

  • A DataFrame of edges (AP-MS interactions) from a network.

Example 1

  1. Obtain the latest version of the 293T PPI network

[41]:
bp_293t_df = getBioPlex('293T', '3.0')
  1. Obtain NetworkX graph representation of 293T PPI network

[42]:
bp_293t_G = bioplex2graph(bp_293t_df)
  1. Obtain core CORUM complexes for Human

[43]:
Corum_DF = getCorum()
  1. Get edges detected via AP-MS for ING2 complex from HEK293T cell line PPI data version 3.0 (ING2 complex ID: 2851)

[44]:
ING2_bp_293t_G = get_PPI_network_for_complex(bp_293t_G, Corum_DF, 2851)
  1. Convert ING2 AP-MS network into DataFrame w/ each row corresponding to an edge

[45]:
ING2_bp_293t_df = get_DataFrame_from_PPI_network(ING2_bp_293t_G)
[46]:
ING2_bp_293t_df.head()
[46]:
UniprotA UniprotB SymbolA SymbolB pW pNI pInt
0 Q09028 O75446 RBBP4 SAP30 3.703916e-21 0.000023 0.999977
1 Q09028 Q13547 RBBP4 HDAC1 5.003673e-30 0.000005 0.999995
2 Q09028 Q92769 RBBP4 HDAC2 3.457866e-18 0.029766 0.970234
3 Q09028 P29374 RBBP4 ARID4A 7.463030e-29 0.000013 0.999987
4 Q09028 Q16576 RBBP4 RBBP7 5.123652e-30 0.000021 0.999979
[47]:
ING2_bp_293t_df.shape
[47]:
(56, 7)

Example 2

  1. Get edges detected via AP-MS for BCOR complex from HEK293T cell line PPI data version 3.0 (BCOR complex ID: 1178)

[48]:
BCOR_bp_293t_G = get_PPI_network_for_complex(bp_293t_G, Corum_DF, 1178)
  1. Convert BCOR AP-MS network into DataFrame w/ each row corresponding to an edge

[49]:
BCOR_bp_293t_df = get_DataFrame_from_PPI_network(BCOR_bp_293t_G)
[50]:
BCOR_bp_293t_df.head()
[50]:
UniprotA UniprotB SymbolA SymbolB pW pNI pInt
0 Q9BSM1 Q8NHM5 PCGF1 KDM2B 3.189895e-09 1.691454e-07 1.000000
1 Q9BSM1 Q6W2J9 PCGF1 BCOR 1.342302e-14 2.582672e-06 0.999997
2 Q9BSM1 Q8N488 PCGF1 RYBP 7.760669e-19 3.110777e-11 1.000000
3 Q9BSM1 Q99496 PCGF1 RNF2 2.181103e-13 3.714136e-03 0.996286
4 Q9BSM1 Q06587 PCGF1 RING1 7.298679e-14 3.132373e-04 0.999687
[51]:
BCOR_bp_293t_df.shape
[51]:
(14, 7)

[7] get_prop_edges_in_complex_identified - function that returns the proportion of interactions between proteins in a CORUM complex detected by AP-MS

Description

This function returns the proportion of all possible PPIs identified through AP-MS between the proteins in a specified CORUM complex.

Parameters

  1. DataFrame of PPIs : Pandas DataFrame

  2. DataFrame of CORUM complexes : Pandas DataFrame

  3. Corum Complex ID: int

Returns

Float

  • The proportion of interactions between all proteins in CORUM complex identified through AP-MS PPI data.

Example 1

  1. Obtain the latest version of the 293T PPI network

[52]:
bp_293t_df = getBioPlex('293T', '3.0')
  1. Obtain NetworkX graph representation of 293T PPI network

[53]:
bp_293t_G = bioplex2graph(bp_293t_df)
  1. Obtain core CORUM complexes for Human

[54]:
Corum_DF = getCorum()
  1. Get proportion of interactions identified for ING2 complex from HEK293T cell line PPI data version 3.0 (ING2 complex ID: 2851)

[55]:
get_prop_edges_in_complex_identified(bp_293t_G, Corum_DF, 2851)
[55]:
0.718

Example 2

  1. Get proportion of interactions identified for Arp2/3 complex from HEK293T cell line PPI data version 3.0 (Arp2/3 complex ID: 27)

[56]:
get_prop_edges_in_complex_identified(bp_293t_G, Corum_DF, 27)
[56]:
0.667

[8] resampling_test_for_uniprot_list - function that runs resampling test to test whether number of edges for PPI network generated from a given set of proteins is enriched for interactions

Description

This function returns a p-value after running a resampling test by

  1. taking the number of proteins in the specified list of uniprot IDs (N)

  2. choosing N random proteins from the Graph generated by all of the PPI data (G)

  3. calculating the number of edges in the Subgraph (S) induced by N random proteins (with the same proportion of baits (+/- 10%) as the CORUM complex) and storing this value (E_i)

  4. if preserve_node_degree option is invoked, then baits & preys in S must have the same degree distribution as baits & preys in the network generated by given uniprot IDs, respectively,

  5. repeating steps 1-3 num_resamples times to create a null distribution

  6. calculating the number of edges between N proteins in the CORUM complex (E)

  7. returning a p-value by calculating the proportion of values [E_1, E_2, … , E_num_resamples] that are greater than or equal to E.

Parameters

  1. Network of PPIs : NetworkX graph

  2. list of uniprot IDs : list

  3. Number of Resamples: int

  4. option to preserve degree distribution in subgraphs : bool

Returns

Float

  • A p-value from a resample test to check for enrichment of PPIs detected between proteins in list

Example 1

  1. Obtain the latest version of the 293T PPI network

[57]:
bp_293t_df = getBioPlex('293T', '3.0')
  1. Obtain NetworkX graph representation of 293T PPI network

[58]:
bp_293t_G = bioplex2graph(bp_293t_df)
  1. Obtain core CORUM complexes for Human

[59]:
Corum_DF = getCorum()
  1. Get list of uniprots for Arp2/3 complex

[60]:
UniProts_Arp_2_3 = get_UniProts_from_CORUM(Corum_DF, Complex_ID = 27)
  1. Calculate p-value to check for enrichment of edges in Arp2/3 complex from HEK293T cell line PPI data version 3.0 (Arp2/3 complex ID: 27)

  • preserve degrees of baits & preys in complex in the randomly sampled subgraphs, note: this option incurs a longer run time. If function call is taking too long to run, try decreasing number of resamples or set preserve_node_degree = False

[61]:
resampling_test_for_uniprot_list(bp_293t_G, UniProts_Arp_2_3, 1000, preserve_node_degree = True)
[61]:
0.000999000999000999

Example 2

  1. Get list of uniprots for ING2 complex

[62]:
UniProts_ING2 = get_UniProts_from_CORUM(Corum_DF, Complex_ID = 2851)
  1. Calculate p-value to check for enrichment of edges in Arp2/3 complex from HEK293T cell line PPI data version 3.0 (Arp2/3 complex ID: 27)

[63]:
resampling_test_for_uniprot_list(bp_293t_G, UniProts_ING2, 1000)
[63]:
0.000999000999000999

Example 3 - iterate through all CORUM complexes and run resampling test on each one using PPI 293T v3 data

[64]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

For each CORUM complex tested:

  • complex has \(\ge 3\) genes in it

  • check that at least one of the genes in complex was targeted as a bait

  • check to see that CORUM complex has at least 1 edge detected in PPI data

Note: resampling test “lags” when the proportion of baits for a CORUM complex < 10%, more resampling of random sub-graphs that don’t satisfy that low number of baits get discarded from p-value calculation

[65]:
Corum_DF.head(n=2)
[65]:
complex_id complex_name synonyms organism cell_line pmid comment_complex comment_members comment_disease comment_drug ... functions_go_name functions_go_ontology fcgs_description fcgs_id fcgs_name fcgs_category_id fcgs_category_name fcgs_go_id fcgs_go_name fcgs_go_ontology
0 1 BCL6-HDAC4 complex NaN Human U2OS osteosarcoma-derived UTA-L cells 11929873 Transcriptional repression by BCL6 is thought ... NaN NaN NaN ... histone deacetylase activity;nucleus;DNA topol... molecular_function;cellular_component;biologic... NaN NaN NaN NaN NaN NaN NaN NaN
1 2 BCL6-HDAC5 complex NaN Human U2OS osteosarcoma-derived UTA-L cells 11929873 Transcriptional repression by BCL6 is thought ... NaN NaN NaN ... histone deacetylase activity;nucleus;DNA topol... molecular_function;cellular_component;biologic... NaN NaN NaN NaN NaN NaN NaN NaN

2 rows × 34 columns

[66]:
Corum_DF.shape
[66]:
(5376, 34)

[3.1]

Running this for every complex takes a couple minutes to run, to speed up RunTime we’ll set {num_resamples} = 1000

[67]:
resampling_pvals_CORUM_complex_dict = {}

# get set of baits
bp_i_baits = set(bp_293t_df.UniprotA)

count_i = 0
for CORUM_complex_ID in Corum_DF.complex_id:

    # check to see if CORUM complex has >= 3 genes in it
    genes_in_complex_i = get_UniProts_from_CORUM(Corum_DF, Complex_ID = CORUM_complex_ID)
    if len(genes_in_complex_i) >= 3:

        # check to see that at least one of the genes was targeted as a bait
        genes_as_baits_bool = np.array([gene_i in bp_i_baits for gene_i in genes_in_complex_i])
        if np.sum(genes_as_baits_bool) >= 1:

            # check to see that CORUM complex has at least 1 edge
            CORUM_complex_G = get_PPI_network_for_complex(bp_293t_G, Corum_DF, CORUM_complex_ID)
            if len(list(CORUM_complex_G.edges)) >= 1:

                resampling_pvals_CORUM_complex_dict[CORUM_complex_ID] = resampling_test_for_uniprot_list(bp_293t_G, genes_in_complex_i, 100)

    count_i += 1
    if count_i % 1000 == 0:
        print(count_i)

resampling_pvals_CORUM_complex_series = pd.Series(resampling_pvals_CORUM_complex_dict)
1000
2000
3000
4000
5000
[68]:
len(resampling_pvals_CORUM_complex_series) # number of CORUM complexes actually tested
[68]:
1417
[69]:
plt.style.use('ggplot')
plt.rcParams['lines.linewidth']=1.0
plt.rcParams['axes.facecolor']='1.0'
plt.rcParams['xtick.color']='black'
plt.rcParams['axes.grid']=False
plt.rcParams['axes.edgecolor']='black'
plt.rcParams['grid.color']= '1.0'
plt.rcParams.update({'font.size': 14})

fig, ax = plt.subplots()

ax.hist(resampling_pvals_CORUM_complex_series, bins = 40, rwidth = 0.85, color = 'black')
ax.set_yscale('log')

ax.set_title('Distribution of p-vals from resampling test on every CORUM\nHuman core complex w/ 293T v3 BioPlex interaction data ', fontsize = 12, color = 'k', pad = -15)
ax.set_ylabel(f'Number of CORUM complexes\n(N={len(resampling_pvals_CORUM_complex_series)})', fontsize = 12, color = 'k', labelpad = 1)
ax.set_xlabel('p-val from resampling test' , fontsize = 12, color = 'k', labelpad = 1)

ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.grid(False)
ax.tick_params(labelcolor = 'k')
ax.tick_params(axis='y', which='major', labelsize=12 , labelcolor = 'k')
ax.tick_params(axis='x', which='major', labelsize=12 , labelcolor = 'k')

fig = plt.gcf()
fig.set_size_inches(7.5, 4.5)
fig.tight_layout()

plt.show()
_images/BioPlex_Examples_186_0.png
[70]:
resampling_pvals_CORUM_complex_series.sort_values(ascending = True, inplace = True)
resampling_pvals_CORUM_complex_series.head()
[70]:
6884    0.009901
6903    0.009901
6902    0.009901
6901    0.009901
6900    0.009901
dtype: float64
[71]:
resampling_pvals_CORUM_complex_series.tail(n=15)
[71]:
777      0.039604
9975     0.039604
6194     0.039604
5177     0.039604
2875     0.039604
3137     0.039604
245      0.049505
1625     0.049505
10781    0.049505
8911     0.059406
49       0.059406
1624     0.059406
7557     0.079208
5613     0.128713
5615     0.178218
dtype: float64

Example 4 - differential enrichment between PPI networks

Run resampling test on some of the same complexes for BioPlex HEK293T v1, v2 & v3 PPI networks

[72]:
# 293T v1 PPI network
bp_293t_v1_df = getBioPlex('293T', '1.0')
bp_293t_v1_G = bioplex2graph(bp_293t_v1_df)

# 293T v2 PPI network
bp_293t_v2_df = getBioPlex('293T', '2.0')
bp_293t_v2_G = bioplex2graph(bp_293t_v2_df)

# 293T v3 PPI network
bp_293t_v3_df = getBioPlex('293T', '3.0')
bp_293t_v3_G = bioplex2graph(bp_293t_v3_df)

Arp2/3 complex from HEK293T cell line PPI data version 1.0, 2.0 & 3.0 (Arp2/3 complex ID: 27)

[73]:
UniProts_Arp_2_3 = get_UniProts_from_CORUM(Corum_DF, Complex_ID = 27)
[74]:
resampling_test_for_uniprot_list(bp_293t_v1_G, UniProts_Arp_2_3, 1000)
[74]:
0.000999000999000999
[75]:
resampling_test_for_uniprot_list(bp_293t_v2_G, UniProts_Arp_2_3, 1000)
[75]:
0.000999000999000999
[76]:
resampling_test_for_uniprot_list(bp_293t_v3_G, UniProts_Arp_2_3, 1000)
[76]:
0.000999000999000999

ING2 complex from HEK293T cell line PPI data version 1.0, 2.0 & 3.0 (ING2 complex ID: 2851)

[77]:
UniProts_ING2 = get_UniProts_from_CORUM(Corum_DF, Complex_ID = 2851)
[78]:
resampling_test_for_uniprot_list(bp_293t_v1_G, UniProts_ING2, 1000)
[78]:
0.000999000999000999
[79]:
resampling_test_for_uniprot_list(bp_293t_v2_G, UniProts_ING2, 1000)
[79]:
0.000999000999000999
[80]:
resampling_test_for_uniprot_list(bp_293t_v3_G, UniProts_ING2, 1000)
[80]:
0.000999000999000999

CASP8-FADD-MALT1-BCL10 complex from HEK293T cell line PPI data version 1.0, 2.0 & 3.0 (CASP8-FADD-MALT1-BCL10 complex ID: 2054)

*NOTE: V1 and V2 should fail, added in V3

[81]:
UniProts_list = get_UniProts_from_CORUM(Corum_DF, Complex_ID = 2054)
[82]:
resampling_test_for_uniprot_list(bp_293t_v1_G, UniProts_list, 1000)
ERROR: no edges detected in PPI data for this protein list, p-value could not be computed.
[83]:
resampling_test_for_uniprot_list(bp_293t_v2_G, UniProts_list, 1000)
ERROR: no edges detected in PPI data for this protein list, p-value could not be computed.
[84]:
resampling_test_for_uniprot_list(bp_293t_v3_G, UniProts_list, 1000)
[84]:
0.007992007992007992

[9] display_PPI_network_for_complex - function to visualize PPI data for a given complex from CORUM

Description

Display network of BioPlex PPIs for a CORUM complex - This function displays a complete network in which nodes represent the proteins in a specified CORUM complex and edges represent BioPlex PPIs using NetworkX. Edges detected through AP-MS are colored darker.

See Huttlin, E. L., Bruckner, R. J., Navarrete-Perea, J., Cannon, J. R., Baltier, K., Gebreab, F., … & Gygi, S. P. (2021). Dual proteome-scale networks reveal cell-specific remodeling of the human interactome. Cell, 184(11), 3022-3040. for reference.

Parameters

  1. ax object to draw on: Matplotlib Axes

  2. DataFrame of PPIs : Pandas DataFrame

  3. DataFrame of CORUM complexes : Pandas DataFrame

  4. Corum Complex ID: int

  5. Size of Nodes in Network: int

  6. Width of Edges in Network: float

  7. optional Size of font for Node Labels: int

  8. optional Color of Nodes targeted as baits: str

  9. optional Color of Nodes detected as preys only: str

  10. optional Color of Edges observed via AP-MS from PPI data: str

  11. optional NetworkX Position of Nodes: dict

Returns

Node Positions

  • Dictionary of Node Positions in NetworkX layout.

Example 1

[85]:
import matplotlib.pyplot as plt
  1. Obtain the latest version of the 293T PPI network

[86]:
bp_293t_df = getBioPlex('293T', '3.0')
  1. Obtain core CORUM complexes for Human

[87]:
Corum_DF = getCorum()
  1. Visualize network for specified protein complex using PPI data (ING2 complex ID: 2851)

ING2 complex from HEK293T cell line PPI data version 3.0

[88]:
Corum_DF[Corum_DF.complex_name == 'ING2 complex']
[88]:
complex_id complex_name synonyms organism cell_line pmid comment_complex comment_members comment_disease comment_drug ... functions_go_name functions_go_ontology fcgs_description fcgs_id fcgs_name fcgs_category_id fcgs_category_name fcgs_go_id fcgs_go_name fcgs_go_ontology
1093 2851 ING2 complex NaN Human HeLa S3 cells 16387653 ING2 is in an HDAC complex similar to ING1.ING... NaN NaN NaN ... mitotic cell cycle;angiogenesis;nucleus;DNA to... biological_process;biological_process;cellular... NaN NaN NaN NaN NaN NaN NaN NaN

1 rows × 34 columns

[89]:
fig, ax = plt.subplots()

ING2_node_layout = display_PPI_network_for_complex(ax, bp_293t_df, Corum_DF, 2851, 2300, 3.5)

fig = plt.gcf()
fig.set_size_inches(7.5, 7.5)
fig.tight_layout()


# save figure as PNG plt.savefig(fig_out_path, bbox_inches='tight', dpi = 300 , transparent = True)
plt.show()
_images/BioPlex_Examples_222_0.png

Example 2

Arp2/3 complex from HEK293T cell line PPI data version 3.0

[90]:
Corum_DF[Corum_DF.complex_name == 'Arp2/3 protein complex']
[90]:
complex_id complex_name synonyms organism cell_line pmid comment_complex comment_members comment_disease comment_drug ... functions_go_name functions_go_ontology fcgs_description fcgs_id fcgs_name fcgs_category_id fcgs_category_name fcgs_go_id fcgs_go_name fcgs_go_ontology
10 27 Arp2/3 protein complex ARP2/3 protein complex Human Neutrophils 9359840 NaN NaN Transcript levels of ARP2/3 complex subunits, ... NaN ... Arp2/3 protein complex;regulation of actin fil... cellular_component;biological_process NaN NaN NaN NaN NaN NaN NaN NaN

1 rows × 34 columns

[91]:
fig, ax = plt.subplots()

Arp23_node_layout = display_PPI_network_for_complex(ax, bp_293t_df, Corum_DF, 27, 2300, 3.5)

fig = plt.gcf()
fig.set_size_inches(7.5, 7.5)
fig.tight_layout()

#plt.savefig(fig_out_path, bbox_inches='tight', dpi = 300 , transparent = True)
plt.show()
_images/BioPlex_Examples_226_0.png

Example 3

[92]:
from matplotlib import gridspec

COP9 Signalsome complex from HCT116 cell line PPI data version 1.0 & HEK293T cell line PPI data version 3.0

[93]:
Corum_DF[Corum_DF.complex_name == 'COP9 signalosome complex']
[93]:
complex_id complex_name synonyms organism cell_line pmid comment_complex comment_members comment_disease comment_drug ... functions_go_name functions_go_ontology fcgs_description fcgs_id fcgs_name fcgs_category_id fcgs_category_name fcgs_go_id fcgs_go_name fcgs_go_ontology
802 2174 COP9 signalosome complex JAB1-containing signalosome (GPS1, COPS2, COPS... Human HeLa cells; human JU77 mesothelioma cells 9535219 The purified complex is very similar, if not i... Since the authors did not specify COP7, we use... NaN NaN ... COP9 signalosome;COP9 signalosome assembly cellular_component;biological_process NaN NaN NaN NaN NaN NaN NaN NaN

1 rows × 34 columns

[94]:
fig = plt.figure(constrained_layout=True)
spec = gridspec.GridSpec(ncols=2, nrows=1, figure=fig) # define 2 columns since we'll have two networks
spec.update(wspace=0.025) # set the spacing between axes.

# HCT116 1.0
ax_HCT116_v1 = fig.add_subplot(spec[0]) # create axes object for HCT116 v1 network
bp_HCT116_v1_PPI_df = getBioPlex('HCT116', '1.0') # load PPI data for HCT116 v1
COP9_node_layout = display_PPI_network_for_complex(ax_HCT116_v1, bp_HCT116_v1_PPI_df, Corum_DF, 2174, 2300, 3.5, bait_node_color='xkcd:blue', prey_node_color='xkcd:light blue', AP_MS_edge_color='xkcd:blue')

# HEK293T 3.0
ax_293T_v3 = fig.add_subplot(spec[1]) # create axes object for HEK293T v3 network
bp_293T_v3_df = getBioPlex('293T', '3.0') # load PPI data for HEK293T v3
COP9_node_layout = display_PPI_network_for_complex(ax_293T_v3, bp_293T_v3_df, Corum_DF, 2174, 2300, 3.5, node_pos=COP9_node_layout)

fig = plt.gcf()
fig.set_size_inches(15, 7.5)
fig.tight_layout()

#plt.savefig(fig_out_path, bbox_inches='tight', dpi = 300 , transparent = True)
plt.show()
_images/BioPlex_Examples_231_0.png

Example 4

Fanconi Anemia Core complex from HCT116 cell line PPI data version 1.0 & HEK293T cell line PPI data version 3.0

[95]:
Corum_DF[Corum_DF.complex_name == 'FA core complex 1']
[95]:
complex_id complex_name synonyms organism cell_line pmid comment_complex comment_members comment_disease comment_drug ... functions_go_name functions_go_ontology fcgs_description fcgs_id fcgs_name fcgs_category_id fcgs_category_name fcgs_go_id fcgs_go_name fcgs_go_ontology
683 1623 FA core complex 1 Fanconi anemia core complex 1 Human NaN 12093742 FANCE functions to target cytoplasmic FANCC to... NaN FA complex is involved in Fanconi anemia (FA) ... NaN ... nucleus;DNA damage response;protein ubiquitina... cellular_component;biological_process;biologic... NaN NaN NaN NaN NaN NaN NaN NaN

1 rows × 34 columns

[96]:
fig = plt.figure(constrained_layout=True)
spec = gridspec.GridSpec(ncols=2, nrows=1, figure=fig) # define 2 columns since we'll have two networks
spec.update(wspace=0.025) # set the spacing between axes.

# HCT116 1.0
ax_HCT116_v1 = fig.add_subplot(spec[0]) # create axes object for HCT116 v1 network
bp_HCT116_v1_PPI_df = getBioPlex('HCT116', '1.0') # load PPI data for HCT116 v1
Fanconi_Anemia_node_layout = display_PPI_network_for_complex(ax_HCT116_v1, bp_HCT116_v1_PPI_df, Corum_DF, 1623, 2300, 3.5, bait_node_color='xkcd:blue', prey_node_color='xkcd:light blue', AP_MS_edge_color='xkcd:blue')
ax_HCT116_v1.set_title('HCT116 v1.0', color = 'black', fontsize = 14) # set title

# HEK293T 3.0
ax_293T_v3 = fig.add_subplot(spec[1]) # create axes object for HEK293T v3 network
bp_293T_v3_df = getBioPlex('293T', '3.0') # load PPI data for HEK293T v3
Fanconi_Anemia_node_layout = display_PPI_network_for_complex(ax_293T_v3, bp_293T_v3_df, Corum_DF, 1623, 2300, 3.5, node_pos = Fanconi_Anemia_node_layout)
ax_293T_v3.set_title('293T v3.0', color = 'black', fontsize = 14) # set title

fig = plt.gcf()
fig.set_size_inches(15, 7.5)
fig.tight_layout()

# save figure as PNG
#plt.savefig(fig_out_path, bbox_inches='tight', dpi = 300 , transparent = True)
plt.show()
_images/BioPlex_Examples_235_0.png

Example 5 - Recreate Figure 1 from Cell 2021 BioPlex 3.0 paper

Exosome complex

[97]:
Corum_DF[Corum_DF.complex_id == 7443]
[97]:
complex_id complex_name synonyms organism cell_line pmid comment_complex comment_members comment_disease comment_drug ... functions_go_name functions_go_ontology fcgs_description fcgs_id fcgs_name fcgs_category_id fcgs_category_name fcgs_go_id fcgs_go_name fcgs_go_ontology
2626 7443 Exosome NaN Human HeLa cells 20531389 NaN NaN NaN NaN ... RNA exonuclease activity;cytoplasm;RNA catabol... molecular_function;cellular_component;biologic... NaN NaN NaN NaN NaN NaN NaN NaN

1 rows × 34 columns

BCOR complex

[98]:
Corum_DF[Corum_DF.complex_id == 1178]
[98]:
complex_id complex_name synonyms organism cell_line pmid comment_complex comment_members comment_disease comment_drug ... functions_go_name functions_go_ontology fcgs_description fcgs_id fcgs_name fcgs_category_id fcgs_category_name fcgs_go_id fcgs_go_name fcgs_go_ontology
555 1178 BCOR complex Ubiquitin E3 ligase Human NaN 16943429 Ubiquitin E3 ligases covalently attach ubiquit... NaN NaN NaN ... nucleus;DNA-templated transcription;ubiquitin-... cellular_component;biological_process;biologic... NaN NaN NaN NaN NaN NaN NaN NaN

1 rows × 34 columns

[99]:
fig = plt.figure(constrained_layout=True)
spec = gridspec.GridSpec(ncols=4, nrows=2, figure=fig) # define 2 columns since we'll have two networks
spec.update(wspace=0.01,hspace=0.01) # set the spacing between axes.

# HEK293T 3.0 - create axes objects for HEK293T v3 network
ax_293T_v3_A = fig.add_subplot(spec[0,0])
ax_293T_v3_C = fig.add_subplot(spec[0,2])
ax_293T_v3_E = fig.add_subplot(spec[1,0])
ax_293T_v3_G = fig.add_subplot(spec[1,2])

bp_293T_v3_df = getBioPlex('293T', '3.0') # load PPI data for HEK293T v3

Fanconi_Anemia_node_layout = display_PPI_network_for_complex(ax_293T_v3_A, bp_293T_v3_df, Corum_DF, 1623, 800, 1.5, node_font_size=5.5)
Exosome_node_layout = display_PPI_network_for_complex(ax_293T_v3_C, bp_293T_v3_df, Corum_DF, 7443, 800, 1.5, node_font_size=5.5)
COP9_node_layout = display_PPI_network_for_complex(ax_293T_v3_E, bp_293T_v3_df, Corum_DF, 2174, 800, 1.5, node_font_size=5.5)
BCOR_node_layout = display_PPI_network_for_complex(ax_293T_v3_G, bp_293T_v3_df, Corum_DF, 1178, 800, 1.5, node_font_size=5.5)

# set titles
ax_293T_v3_A.set_title('Fanconi Anemia Core Complex\n293T v3.0', color = 'black', fontsize = 10)
ax_293T_v3_C.set_title('Exosome complex\n293T v3.0', color = 'black', fontsize = 10)
ax_293T_v3_E.set_title('COP9 signalosome complex\n293T v3.0', color = 'black', fontsize = 10)
ax_293T_v3_G.set_title('BCOR complex\n293T v3.0', color = 'black', fontsize = 10)

# HCT116 1.0 - create axes object for HCT116 v1 network
ax_HCT116_v1_B = fig.add_subplot(spec[0,1])
ax_HCT116_v1_D = fig.add_subplot(spec[0,3])
ax_HCT116_v1_F = fig.add_subplot(spec[1,1])
ax_HCT116_v1_H = fig.add_subplot(spec[1,3])

bp_HCT116_v1_df = getBioPlex('HCT116', '1.0') # load PPI data for HCT116 v1

Fanconi_Anemia_node_layout = display_PPI_network_for_complex(ax_HCT116_v1_B, bp_HCT116_v1_df, Corum_DF, 1623, 800, 1.5, node_font_size=5.5, bait_node_color='xkcd:blue', prey_node_color='xkcd:light blue', AP_MS_edge_color='xkcd:blue', node_pos=Fanconi_Anemia_node_layout)
Exosome_node_layout = display_PPI_network_for_complex(ax_HCT116_v1_D, bp_HCT116_v1_df, Corum_DF, 7443, 800, 1.5, node_font_size=5.5, bait_node_color='xkcd:blue', prey_node_color='xkcd:light blue', AP_MS_edge_color='xkcd:blue', node_pos=Exosome_node_layout)
COP9_node_layout = display_PPI_network_for_complex(ax_HCT116_v1_F, bp_HCT116_v1_df, Corum_DF, 2174, 800, 1.5, node_font_size=5.5, bait_node_color='xkcd:blue', prey_node_color='xkcd:light blue', AP_MS_edge_color='xkcd:blue', node_pos=COP9_node_layout)
BCOR_node_layout = display_PPI_network_for_complex(ax_HCT116_v1_H, bp_HCT116_v1_df, Corum_DF, 1178, 800, 1.5, node_font_size=5.5, bait_node_color='xkcd:blue', prey_node_color='xkcd:light blue', AP_MS_edge_color='xkcd:blue', node_pos=BCOR_node_layout)

# set titles
ax_HCT116_v1_B.set_title('Fanconi Anemia Core Complex\nHCT116 v1.0', color = 'black', fontsize = 10)
ax_HCT116_v1_D.set_title('Exosome complex\nHCT116 v1.0', color = 'black', fontsize = 10)
ax_HCT116_v1_F.set_title('COP9 signalosome complex\nHCT116 v1.0', color = 'black', fontsize = 10)
ax_HCT116_v1_H.set_title('BCOR complex\nHCT116 v1.0', color = 'black', fontsize = 10)

fig = plt.gcf()
fig.set_size_inches(14.25, 7.5)
fig.tight_layout()

# save figure as PNG
#plt.savefig(fig_out_path, bbox_inches='tight', dpi = 300 , transparent = False)
plt.show()
_images/BioPlex_Examples_241_0.png

Example 6 - Recreate Figure 2 from Nature 2017 BioPlex 2.0 paper & include HEK293 v3 PPI data

Arp2/3 complex

[100]:
Corum_DF[Corum_DF.complex_id == 27]
[100]:
complex_id complex_name synonyms organism cell_line pmid comment_complex comment_members comment_disease comment_drug ... functions_go_name functions_go_ontology fcgs_description fcgs_id fcgs_name fcgs_category_id fcgs_category_name fcgs_go_id fcgs_go_name fcgs_go_ontology
10 27 Arp2/3 protein complex ARP2/3 protein complex Human Neutrophils 9359840 NaN NaN Transcript levels of ARP2/3 complex subunits, ... NaN ... Arp2/3 protein complex;regulation of actin fil... cellular_component;biological_process NaN NaN NaN NaN NaN NaN NaN NaN

1 rows × 34 columns

TFIIH transcription factor complex

[101]:
Corum_DF[Corum_DF.complex_id == 1029]
[101]:
complex_id complex_name synonyms organism cell_line pmid comment_complex comment_members comment_disease comment_drug ... functions_go_name functions_go_ontology fcgs_description fcgs_id fcgs_name fcgs_category_id fcgs_category_name fcgs_go_id fcgs_go_name fcgs_go_ontology
459 1029 TFIIH transcription factor complex NaN Human HeLa cells 8692842 Transcription factor IIH (TFIIH) is a multisub... NaN NaN NaN ... nucleus;DNA repair;regulation of DNA-templated... cellular_component;biological_process;biologic... NaN NaN NaN NaN NaN NaN NaN NaN

1 rows × 34 columns

Checkpoint RAD complex

[102]:
Corum_DF[Corum_DF.complex_id == 274]
[102]:
complex_id complex_name synonyms organism cell_line pmid comment_complex comment_members comment_disease comment_drug ... functions_go_name functions_go_ontology fcgs_description fcgs_id fcgs_name fcgs_category_id fcgs_category_name fcgs_go_id fcgs_go_name fcgs_go_ontology
119 274 9-1-1-RAD17-RFC complex RAD17-RFC-9-1-1 checkpoint supercomplex Human in vitro, human cells expressed in H5 cells 12578958 Rad17-RFC complex binds to nicked circular, ga... NaN NaN NaN ... DNA damage checkpoint signaling;DNA binding;DN... biological_process;molecular_function;biologic... NaN NaN NaN NaN NaN NaN NaN NaN

1 rows × 34 columns

Nau4/Tip60-HAT complex B

[103]:
Corum_DF[Corum_DF.complex_id == 787]
[103]:
complex_id complex_name synonyms organism cell_line pmid comment_complex comment_members comment_disease comment_drug ... functions_go_name functions_go_ontology fcgs_description fcgs_id fcgs_name fcgs_category_id fcgs_category_name fcgs_go_id fcgs_go_name fcgs_go_ontology
358 787 NuA4/Tip60-HAT complex B NaN Human NaN 14966270 The NuA4 histone acetyltransferase (HAT) multi... NaN NaN NaN ... DNA binding;histone acetyltransferase activity... molecular_function;molecular_function;cellular... NaN NaN NaN NaN NaN NaN NaN NaN

1 rows × 34 columns

[104]:
fig = plt.figure(constrained_layout=True)
spec = gridspec.GridSpec(ncols=4, nrows=3, figure=fig)
spec.update(wspace=0.01,hspace=0.01) # set the spacing between axes.

# HEK293T 1.0 - create axes objects for HEK293T v1 network
ax_293T_v1_A = fig.add_subplot(spec[0,0])
ax_293T_v1_B = fig.add_subplot(spec[0,1])
ax_293T_v1_C = fig.add_subplot(spec[0,2])
ax_293T_v1_D = fig.add_subplot(spec[0,3])

bp_293T_v1_df = getBioPlex('293T', '1.0') # load PPI data for HEK293T v1

Arp23_node_layout = display_PPI_network_for_complex(ax_293T_v1_A, bp_293T_v1_df, Corum_DF, 27, 800, 1.5, node_font_size=5.5, bait_node_color='xkcd:blue', prey_node_color='xkcd:light blue', AP_MS_edge_color='xkcd:blue')
TFIIH_node_layout = display_PPI_network_for_complex(ax_293T_v1_B, bp_293T_v1_df, Corum_DF, 1029, 800, 1.5, node_font_size=5.5, bait_node_color='xkcd:blue', prey_node_color='xkcd:light blue', AP_MS_edge_color='xkcd:blue')
RAD_node_layout = display_PPI_network_for_complex(ax_293T_v1_C, bp_293T_v1_df, Corum_DF, 274, 800, 1.5, node_font_size=5.5, bait_node_color='xkcd:blue', prey_node_color='xkcd:light blue', AP_MS_edge_color='xkcd:blue')
NuA4_Tip60_node_layout = display_PPI_network_for_complex(ax_293T_v1_D, bp_293T_v1_df, Corum_DF, 787, 800, 1.5, node_font_size=5.5, bait_node_color='xkcd:blue', prey_node_color='xkcd:light blue', AP_MS_edge_color='xkcd:blue')

# set titles
ax_293T_v1_A.set_title('Arp2/3 protein complex\n293T v1.0', color = 'black', fontsize = 10)
ax_293T_v1_B.set_title('TFIIH transcription factor complex\n293T v1.0', color = 'black', fontsize = 10)
ax_293T_v1_C.set_title('Checkpoint Rad complex\n293T v1.0', color = 'black', fontsize = 10)
ax_293T_v1_D.set_title('Nua4/Tip60-HAT complex B\n293T v1.0', color = 'black', fontsize = 10)

# HEK293T 2.0 - create axes objects for HEK293T v2 network
ax_293T_v2_E = fig.add_subplot(spec[1,0])
ax_293T_v2_F = fig.add_subplot(spec[1,1])
ax_293T_v2_G = fig.add_subplot(spec[1,2])
ax_293T_v2_H = fig.add_subplot(spec[1,3])

bp_293T_v2_df = getBioPlex('293T', '2.0') # load PPI data for HEK293T v2

Arp23_node_layout = display_PPI_network_for_complex(ax_293T_v2_E, bp_293T_v2_df, Corum_DF, 27, 800, 1.5, node_font_size=5.5, node_pos=Arp23_node_layout)
TFIIH_node_layout = display_PPI_network_for_complex(ax_293T_v2_F, bp_293T_v2_df, Corum_DF, 1029, 800, 1.5, node_font_size=5.5, node_pos=TFIIH_node_layout)
RAD_node_layout = display_PPI_network_for_complex(ax_293T_v2_G, bp_293T_v2_df, Corum_DF, 274, 800, 1.5, node_font_size=5.5, node_pos=RAD_node_layout)
NuA4_Tip60_node_layout = display_PPI_network_for_complex(ax_293T_v2_H, bp_293T_v2_df, Corum_DF, 787, 800, 1.5, node_font_size=5.5, node_pos=NuA4_Tip60_node_layout)

# set titles
ax_293T_v2_E.set_title('Arp2/3 protein complex\n293T v2.0', color = 'black', fontsize = 10)
ax_293T_v2_F.set_title('TFIIH transcription factor complex\n293T v2.0', color = 'black', fontsize = 10)
ax_293T_v2_G.set_title('Checkpoint Rad complex\n293T v2.0', color = 'black', fontsize = 10)
ax_293T_v2_H.set_title('Nua4/Tip60-HAT complex B\n293T v2.0', color = 'black', fontsize = 10)

# HEK293T 3.0 - create axes objects for HEK293T v3 network
ax_293T_v3_I = fig.add_subplot(spec[2,0])
ax_293T_v3_J = fig.add_subplot(spec[2,1])
ax_293T_v3_K = fig.add_subplot(spec[2,2])
ax_293T_v3_L = fig.add_subplot(spec[2,3])

bp_293T_v3_df = getBioPlex('293T', '3.0') # load PPI data for HEK293T v3

Arp23_node_layout = display_PPI_network_for_complex(ax_293T_v3_I, bp_293T_v3_df, Corum_DF, 27, 800, 1.5, node_font_size=5.5, bait_node_color='xkcd:green', prey_node_color='xkcd:light green', AP_MS_edge_color='xkcd:green', node_pos=Arp23_node_layout)
TFIIH_node_layout = display_PPI_network_for_complex(ax_293T_v3_J, bp_293T_v3_df, Corum_DF, 1029, 800, 1.5, node_font_size=5.5, bait_node_color='xkcd:green', prey_node_color='xkcd:light green', AP_MS_edge_color='xkcd:green', node_pos=TFIIH_node_layout)
RAD_node_layout = display_PPI_network_for_complex(ax_293T_v3_K, bp_293T_v3_df, Corum_DF, 274, 800, 1.5, node_font_size=5.5, bait_node_color='xkcd:green', prey_node_color='xkcd:light green', AP_MS_edge_color='xkcd:green', node_pos=RAD_node_layout)
NuA4_Tip60_node_layout = display_PPI_network_for_complex(ax_293T_v3_L, bp_293T_v3_df, Corum_DF, 787, 800, 1.5, node_font_size=5.5, bait_node_color='xkcd:green', prey_node_color='xkcd:light green', AP_MS_edge_color='xkcd:green', node_pos=NuA4_Tip60_node_layout)

# set titles
ax_293T_v3_I.set_title('Arp2/3 protein complex\n293T v3.0', color = 'black', fontsize = 10)
ax_293T_v3_J.set_title('TFIIH transcription factor complex\n293T v3.0', color = 'black', fontsize = 10)
ax_293T_v3_K.set_title('Checkpoint Rad complex\n293T v3.0', color = 'black', fontsize = 10)
ax_293T_v3_L.set_title('Nua4/Tip60-HAT complex B\n293T v3.0', color = 'black', fontsize = 10)

fig = plt.gcf()
fig.set_size_inches(12.75, 11.25)
fig.tight_layout()

# save figure as PNG
#plt.savefig(fig_out_path, bbox_inches='tight', dpi = 300 , transparent = False)
plt.show()
_images/BioPlex_Examples_251_0.png

[10] Functions to calculate and visualize physical interactions between chains of PDB structure and compare with PPI data

Example 1: Arp 2/3 complex (calling each function separately)

[1] get_UniProts_from_CORUM - function to get the set of UniProt IDs corresponding to a CORUM complex ID

Description

This function takes a CORUM complex ID and CORUM complex DataFrame and returns the corresponding UniProt IDs.

Parameters

  1. DataFrame of CORUM complexes : Pandas DataFrame

  2. Corum Complex ID: int

Returns

UniProt IDs

  • A list of UniProt IDs for the CORUM complex specified.

[105]:
Corum_DF = getCorum() # (1) Obtain CORUM complexes
UniProts_Arp_2_3 = get_UniProts_from_CORUM(Corum_DF = Corum_DF, Complex_ID = 27) # (2) Get set of UniProt IDs for specified protein complex (Arp 2/3 complex ID: 27)
[106]:
print(UniProts_Arp_2_3)
['P61160', 'P61158', 'O15143', 'O15144', 'O15145', 'P59998', 'O15511']

[2] get_PDB_from_UniProts - function to get the set of PDB IDs corresponding to a set of UniProt IDs

Description

This function takes a list of UniProt IDs and maps the corresponding UniProt IDs (from the UniProt IDs input or CORUM complex ID) to PDB IDs using the SIFTS project. Some metadata for each PDB ID is pulled from PDB and stored in a DataFrame that is returned.

Parameters

  1. UniProt IDs : list

Returns

PDB IDs and associated metadata

  • Pandas DataFrame of PDB IDs that map to the UniProt IDs input, or corresponding UniProt IDs from the CORUM complex specified.

[107]:
PDB_IDs_df = get_PDB_from_UniProts(UniProts_Arp_2_3)
[108]:
PDB_IDs_df
[108]:
num_proteins deposit_date citation_title UniProts_mapped_to_PDB num_proteins_diff_btwn_PDB_and_UniProts_input
6YW7 7 2020-04-29 00:00:00+00:00 Cryo-EM of human Arp2/3 complexes provides str... [P61160, P61158, O15144, O15145, P59998, O15511] 0
6YW6 7 2020-04-29 00:00:00+00:00 Cryo-EM of human Arp2/3 complexes provides str... [P61160, P61158, O15143, O15144, O15145, P59998] 0
6UHC 8 2019-09-27 00:00:00+00:00 Cryo-EM structure of NPF-bound human Arp2/3 co... [P61160, P61158, O15143, O15144, O15145, P5999... 1
9I2B 10 2025-01-20 00:00:00+00:00 Arp2/3-mediated bidirectional actin assembly b... [P61160, P61158, O15143, O15144, O15145, P59998] 3
8P94 12 2023-06-05 00:00:00+00:00 Cortactin stabilizes actin branches by bridgin... [P61160, P61158, O15143, O15144, O15145, P59998] 5

[3] get_interacting_chains_from_PDB - function that takes a PDB ID and gets the chains that are physically close to eachother.

Description

This function downloads the PDB structure that is specified from the input PDB ID into the input directory, then computes the pairwise distances between all atoms for each pair of chains in the structure. A list of chain pairs that are interacting (have at least a pair of atoms < dist_threshold angstroms apart) is returned.

Parameters

  1. PDB ID: str

  2. directory to store PDB file: str

  3. distance threshold: int

Returns

Interacting Chains

  • List of chain pairs from PDB structure that have at least one pair of atoms located < distance threshold apart.

[3] Choose a PDB ID and get the chains that are physically close to eachother.

[109]:
PDB_ID = '6YW7'
[110]:
protein_structure_dir = os.path.join(os.getcwd(),"protein_function_testing")

interacting_chains_list = get_interacting_chains_from_PDB(PDB_ID, protein_structure_dir, 6)
Downloading PDB structure '6yw7'...
[111]:
interacting_chains_list
[111]:
[['A', 'D'],
 ['A', 'E'],
 ['A', 'B'],
 ['D', 'F'],
 ['B', 'F'],
 ['B', 'G'],
 ['F', 'G'],
 ['F', 'C'],
 ['G', 'C']]

[4] chain_to_UniProt_mapping_dict - function that return a mapping from PDB chain ID -> Uniprot ID for a given PDB ID.

Description

This function retrieves PDB > UniProt mappings using the get_mappings_data() function, the parses the resulting JSON to construct a dictionary where each key is a chain from the PDB structure, and the corresponding value for each is a list of UniProt IDs that map to the chain from the SIFTS project (modified from https://github.com/PDBeurope/pdbe-api-training/blob/master/api_tutorials/5_PDB_to_UniProt_mappings_with_SIFTS.ipynb)

Parameters

  1. pdb_id: str

Returns

Chain to UniProt Map

  • Dictionary of PDB ID chain to UniProt ID mappings.

[112]:
chain_to_UniProt_mapping_dict = list_uniprot_pdb_mappings(PDB_ID)
[113]:
chain_to_UniProt_mapping_dict
[113]:
{'C': ['Q92747'],
 'D': ['O15144'],
 'E': ['O15145'],
 'B': ['P61160'],
 'G': ['O15511'],
 'F': ['P59998'],
 'A': ['P61158']}

[5] PDB_chains_to_uniprot - function that takes interacting chains and chain > uniprot mapping and returns the interacting uniprot IDs.

Description

This function takes the list of interacting chains from function get_interacting_chains_from_PDB() and the chain to UniProt mappings from function list_uniprot_pdb_mappings() and returns a list of interacting chains using UniProt IDs.

Parameters

  1. Interacting Chains: list

  2. Chain to UniProt Map: dict

Returns

Interacting Chains

  • List of interacting chains using UniProt IDs.

[114]:
interacting_UniProt_IDs = PDB_chains_to_uniprot(interacting_chains_list, chain_to_UniProt_mapping_dict)
[115]:
interacting_UniProt_IDs
[115]:
[['P61158', 'O15144'],
 ['P61158', 'O15145'],
 ['P61158', 'P61160'],
 ['O15144', 'P59998'],
 ['P61160', 'P59998'],
 ['P61160', 'O15511'],
 ['P59998', 'O15511'],
 ['P59998', 'Q92747'],
 ['O15511', 'Q92747']]

[6] display_PDB_network_for_complex - function to visualize interacting chains as a network.

Description

This function displays a complete network in which nodes represent the proteins in a specified PDB structure, and edges represent chains in that structure, using NetworkX. Edges that are classified as interacting (are < 6 angstroms apart) are colored black.

Parameters

  1. ax object to draw on: Matplotlib Axes

  2. Mapping of Chains to UniProt IDs: dictionary

  3. List of Interacting Chains: list

  4. Size of Nodes in Network: int

  5. Width of Edges in Network: float

  6. Size of font for Node Labels: int (optional)

Returns

Node Positions

  • Dictionary of Node Positions in NetworkX layout

Interacting Network Edges

  • List of Edges for Interacting Nodes

Number of Network Edges

  • Float of the Number of Possible Interacting Edges

[7] display_PPI_network_match_PDB - function to visualize BioPlex PPI data as networks using interacting chains network.

Description

This function displays a complete network in which nodes represent the proteins in a specified PDB structure, and edges represent chains in that structure, using NetworkX. Edges that are classified as interacting from BioPlex PPI data (detected through AP-MS) are colored darker.

Parameters

  1. ax object to draw on: Matplotlib Axes

  2. Mapping of Chains to UniProt IDs: dictionary

  3. List of Interacting Chains: list

  4. DataFrame of PPIs : Pandas DataFrame

  5. Networkx Position of Nodes: dict

  6. Size of Nodes in Network: int

  7. Width of Edges in Network: float

  8. Size of font for Node Labels: int (optional)

  9. Color of Nodes targeted as baits: str (optional)

  10. Color of Nodes detected as preys only: str (optional)

  11. Color of Edges observed via AP-MS from PPI data: str (optional)

Returns

Interacting Network Edges

  • List of Edges for Interacting Nodes

Number of Network Edges

  • Float of the Number of Possible Interacting Edges

[116]:
import matplotlib.pyplot as plt
from matplotlib import gridspec

fig = plt.figure(constrained_layout=True)
spec = gridspec.GridSpec(ncols=4, nrows=1, figure=fig) # define 2 columns since we'll have two networks
spec.update(wspace=0.025) # set the spacing between axes

# chain/uniprot physical interaction network
ax4 = fig.add_subplot(spec[3])
node_layout_pdb, edges_list_pdb, num_possible_edges_pdb = display_PDB_network_for_complex(ax4, chain_to_UniProt_mapping_dict, interacting_UniProt_IDs, 1500, 3, node_font_size = 8)
prop_edges_pdb = round(float(len(edges_list_pdb)) / num_possible_edges_pdb, 3)
ax4.set_title(f'Direct Interactions from PDB structure\nproportion edges detected = {prop_edges_pdb}', color = 'black', fontsize = 14) # set title

# chain/uniprot PPI interaction networks
ax1 = fig.add_subplot(spec[0])
bp_293T_v1_df = getBioPlex('293T', '1.0') # load PPI data for HEK293T v1
edges_list_bp, num_possible_edges_bp = display_PPI_network_match_PDB(ax1, chain_to_UniProt_mapping_dict, interacting_UniProt_IDs, bp_293T_v1_df, node_layout_pdb, 1500, 3, node_font_size = 8)
prop_edges_bp = round(float(len(edges_list_bp)) / num_possible_edges_bp, 3)
ax1.set_title(f'BioPlex 293T v1.0 Interactions\nproportion edges detected = {prop_edges_bp}', color = 'black', fontsize = 14) # set title

ax2 = fig.add_subplot(spec[1])
bp_293T_v2_df = getBioPlex('293T', '2.0') # load PPI data for HEK293T v2
edges_list_bp, num_possible_edges_bp = display_PPI_network_match_PDB(ax2, chain_to_UniProt_mapping_dict, interacting_UniProt_IDs, bp_293T_v2_df, node_layout_pdb, 1500, 3, node_font_size = 8)
prop_edges_bp = round(float(len(edges_list_bp)) / num_possible_edges_bp, 3)
ax2.set_title(f'BioPlex 293T v2.0 Interactions\nproportion edges detected = {prop_edges_bp}', color = 'black', fontsize = 14) # set title

ax3 = fig.add_subplot(spec[2])
bp_293T_v3_df = getBioPlex('293T', '3.0') # load PPI data for HEK293T v3
edges_list_bp, num_possible_edges_bp = display_PPI_network_match_PDB(ax3, chain_to_UniProt_mapping_dict, interacting_UniProt_IDs, bp_293T_v3_df, node_layout_pdb, 1500, 3, node_font_size = 8)
prop_edges_bp = round(float(len(edges_list_bp)) / num_possible_edges_bp, 3)
ax3.set_title(f'BioPlex 293T v3.0 Interactions\nproportion edges detected = {prop_edges_bp}', color = 'black', fontsize = 14) # set title

fig = plt.gcf()
fig.set_size_inches(20, 5)
fig.tight_layout()

# save figure as PNG
#plt.savefig(fig_out_path, bbox_inches='tight', dpi = 300 , transparent = True)
plt.show()
_images/BioPlex_Examples_315_0.png

Find the edges that are detected in both PDB structure network and BioPlex 293Tv3 networks.

[117]:
overlapping_edges_list = list(set(edges_list_pdb).intersection(set(edges_list_bp)))
overlapping_edges_list
[117]:
[('P61160', 'P61158'),
 ('Q92747', 'O15511'),
 ('O15145', 'P61158'),
 ('O15144', 'P61158')]

Calculate proportion of edges that are detected in both PDB structure network and BioPlex 293Tv3 networks.

[118]:
round(float(len(overlapping_edges_list)) / num_possible_edges_pdb, 3)
[118]:
0.19

Example 2: TFIIH Complex (using wrapper function)

[1] Get the set of UniProt IDs corresponding to a CORUM complex ID

[119]:
Corum_DF = getCorum() # (1) Obtain CORUM complexes
UniProts_TFIIH = get_UniProts_from_CORUM(Corum_DF = Corum_DF, Complex_ID = 107) # (2) Get set of UniProt IDs for specified protein complex (TFIIH complex ID: 107)
[120]:
print(UniProts_TFIIH)
['P51946', 'P50613', 'P18074', 'P19447', 'P32780', 'Q13888', 'Q13889', 'Q92759', 'P51948']

[2] Get the set of PDB IDs corresponding to a set of UniProt IDs

[121]:
PDB_IDs_df = get_PDB_from_UniProts(UniProts_TFIIH)
[122]:
PDB_IDs_df.head()
[122]:
num_proteins deposit_date citation_title UniProts_mapped_to_PDB num_proteins_diff_btwn_PDB_and_UniProts_input
8EBY 9 2022-08-31 00:00:00+00:00 Lesion recognition by XPC, TFIIH and XPA in DN... [P18074, P19447, P32780, Q13888, Q13889, Q92759] 0
8EBU 9 2022-08-31 00:00:00+00:00 Lesion recognition by XPC, TFIIH and XPA in DN... [P18074, P19447, P32780, Q13888, Q13889, Q92759] 0
8EBW 10 2022-08-31 00:00:00+00:00 Lesion recognition by XPC, TFIIH and XPA in DN... [P18074, P19447, P32780, Q13888, Q13889, Q92759] 1
8EBV 10 2022-08-31 00:00:00+00:00 Lesion recognition by XPC, TFIIH and XPA in DN... [P18074, P19447, P32780, Q13888, Q13889, Q92759] 1
8EBT 10 2022-08-31 00:00:00+00:00 Lesion recognition by XPC, TFIIH and XPA in DN... [P18074, P19447, P32780, Q13888, Q13889, Q92759] 1

[3] PDB_to_interacting_chains_uniprot_maps - wrapper function, choose a PDB ID and get the chains that are physically close to eachother in Uniprot IDs and returns a mapping from PDB chain ID -> Uniprot ID for a given PDB ID

Description

This is a wrapper function for functions

    1. get_interacting_chains_from_PDB()

    1. list_uniprot_pdb_mappings()

    1. PDB_chains_to_uniprot()

to get a list of interacting chains from PDB structure using UniProt labels and the chain-to-UniProt mapping for this PDB structure.

Parameters

  1. PDB ID: str

  2. directory to store PDB file: str

  3. distance threshold: int

Returns

Chain to UniProt Map

  • Dictionary of PDB ID chain to UniProt ID mappings

Interacting Chains

  • List of interacting chains using UniProt IDs

[123]:
protein_structure_dir = os.path.join(os.getcwd(),"protein_function_testing")
chain_to_UniProt_mapping_dict, interacting_UniProt_IDs = PDB_to_interacting_chains_uniprot_maps('6NMI', protein_structure_dir, 6)
Downloading PDB structure '6nmi'...
[124]:
chain_to_UniProt_mapping_dict
[124]:
{'H': ['P51948'],
 'C': ['P32780'],
 'D': ['Q92759'],
 'E': ['Q13888'],
 'B': ['P18074'],
 'G': ['Q6ZYL4'],
 'F': ['Q13889'],
 'A': ['P19447']}
[125]:
interacting_UniProt_IDs
[125]:
[['P19447', 'P18074'],
 ['P19447', 'Q92759'],
 ['P19447', 'Q13888'],
 ['P19447', 'Q6ZYL4'],
 ['P19447', 'P51948'],
 ['P18074', 'P32780'],
 ['P18074', 'Q13888'],
 ['P18074', 'P51948'],
 ['P32780', 'Q92759'],
 ['P32780', 'Q13888'],
 ['P32780', 'Q13889'],
 ['Q92759', 'Q13889'],
 ['Q92759', 'Q6ZYL4'],
 ['Q13888', 'Q13889']]

[4] Visualize interacting chains and corresponding BioPlex PPI data as networks.

[126]:
import matplotlib.pyplot as plt
from matplotlib import gridspec

fig = plt.figure(constrained_layout=True)
spec = gridspec.GridSpec(ncols=4, nrows=1, figure=fig) # define 2 columns since we'll have two networks
spec.update(wspace=0.025) # set the spacing between axes

# chain/uniprot physical interaction network
ax4 = fig.add_subplot(spec[3])
node_layout_pdb, edges_list_pdb, num_possible_edges_pdb = display_PDB_network_for_complex(ax4, chain_to_UniProt_mapping_dict, interacting_UniProt_IDs, 1500, 3, node_font_size = 8)
prop_edges_pdb = round(float(len(edges_list_pdb)) / num_possible_edges_pdb, 3)
ax4.set_title(f'Direct Interactions from PDB structure\nproportion edges detected = {prop_edges_pdb}', color = 'black', fontsize = 14) # set title

# chain/uniprot PPI interaction networks
ax1 = fig.add_subplot(spec[0])
bp_293T_v1_df = getBioPlex('293T', '1.0') # load PPI data for HEK293T v1
edges_list_bp, num_possible_edges_bp = display_PPI_network_match_PDB(ax1, chain_to_UniProt_mapping_dict, interacting_UniProt_IDs, bp_293T_v1_df, node_layout_pdb, 1500, 3, node_font_size = 8)
prop_edges_bp = round(float(len(edges_list_bp)) / num_possible_edges_bp, 3)
ax1.set_title(f'BioPlex 293T v1.0 Interactions\nproportion edges detected = {prop_edges_bp}', color = 'black', fontsize = 14) # set title

ax2 = fig.add_subplot(spec[1])
bp_293T_v2_df = getBioPlex('293T', '2.0') # load PPI data for HEK293T v2
edges_list_bp, num_possible_edges_bp = display_PPI_network_match_PDB(ax2, chain_to_UniProt_mapping_dict, interacting_UniProt_IDs, bp_293T_v2_df, node_layout_pdb, 1500, 3, node_font_size = 8)
prop_edges_bp = round(float(len(edges_list_bp)) / num_possible_edges_bp, 3)
ax2.set_title(f'BioPlex 293T v2.0 Interactions\nproportion edges detected = {prop_edges_bp}', color = 'black', fontsize = 14) # set title

ax3 = fig.add_subplot(spec[2])
bp_293T_v3_df = getBioPlex('293T', '3.0') # load PPI data for HEK293T v3
edges_list_bp, num_possible_edges_bp = display_PPI_network_match_PDB(ax3, chain_to_UniProt_mapping_dict, interacting_UniProt_IDs, bp_293T_v3_df, node_layout_pdb, 1500, 3, node_font_size = 8)
prop_edges_bp = round(float(len(edges_list_bp)) / num_possible_edges_bp, 3)
ax3.set_title(f'BioPlex 293T v3.0 Interactions\nproportion edges detected = {prop_edges_bp}', color = 'black', fontsize = 14) # set title

fig = plt.gcf()
fig.set_size_inches(20, 5)
fig.tight_layout()

# save figure as PNG
#plt.savefig(fig_out_path, bbox_inches='tight', dpi = 300 , transparent = True)
plt.show()
_images/BioPlex_Examples_338_0.png

Find the edges that are detected in both PDB structure network and BioPlex 293Tv1-3 networks.

[127]:
overlapping_edges_list = list(set(edges_list_pdb).intersection(set(edges_list_bp)))
overlapping_edges_list
[127]:
[('Q92759', 'Q6ZYL4'),
 ('P32780', 'Q13888'),
 ('P32780', 'Q92759'),
 ('Q6ZYL4', 'P19447'),
 ('Q13888', 'P19447'),
 ('Q92759', 'Q13889'),
 ('P32780', 'Q13889'),
 ('P51948', 'P18074'),
 ('Q13888', 'Q13889'),
 ('P51948', 'P19447'),
 ('Q92759', 'P19447')]

Calculate proportion of edges that are detected in both PDB structure network and BioPlex 293Tv3 networks.

[128]:
round(float(len(overlapping_edges_list)) / num_possible_edges_pdb, 3)
[128]:
0.393