a GO tool

Application Programming Interface

aGOtool has two application programming interfaces(APIs) which enable you to get data without using the graphical user interface of the web page. The APIs are convenient if you need programmatic access. Please have a look at the examples. Why are there two APIs? Beacause, one already existed (and remains for backwards compatibility) and the other was created to facilitate easy Cytoscape integration.

A.) This API is analogous to STRING's API.

Please follow the STRING API help link for instructions of how to use this API and
replace the STRING URL https://string-db.org/api with the following aGOtool URL


B.) The second API is described in examples below.


import requests
import pandas as pd
from io import StringIO

url = r"https://agotool.org/api_orig"

call api_help for help and to get default arguments (parameters are described in detail below)
response = requests.get(r"https://agotool.org/api_help")

Example #1

Protein name: trpA; Organism: Escherichia coli K12_MG1655

ENSPs = ['511145.b1260', '511145.b1261', '511145.b1262', '511145.b1263', '511145.b1264', '511145.b1812',
     '511145.b2551', '511145.b3117', '511145.b3360', '511145.b3772', '511145.b4388']
fg = "%0d".join(ENSPs)
result = requests.post(url,
                   params={"output_format": "json",
                           "enrichment_method": "genome",
                           "taxid": 83333},
            # UniProt reference proteomes uses "Escherichia coli (strain K12)"
            # with Taxid 83333 as a pan proteome instead of 511145 (TaxID on taxonomic rank of species).
                   data={"foreground": fg})
result_json = result.json()
print("Enrichment results for {} number of terms".format(len(result_json)))

Example #2

Use iPython or Jupyter notebook to interactively view results as a pandas data frame.

Protein name: CDC15; Organism: Saccharomyces cerevisiae

ENSPs = ['4932.YAR019C', '4932.YFR028C', '4932.YGR092W', '4932.YHR152W', '4932.YIL106W', '4932.YJL076W',
     '4932.YLR079W', '4932.YML064C', '4932.YMR055C', '4932.YOR373W', '4932.YPR119W']
fg = "%0d".join(ENSPs)
result = requests.post(url,
                   params={"output_format": "tsv",
                           "enrichment_method": "genome",
                           "taxid": 559292},
            # UniProt reference proteomes uses "Saccharomyces cerevisiae S288C"
            # with Taxid 559292 as a pan proteome instead of 4932 (TaxID on taxonomic rank of species).
                   data={"foreground": fg})
df = pd.read_csv(StringIO(result.text), sep='\t')

Example #3

Protein name: smoothened; Organism: Mus musculus

ENSPs = ['10090.ENSMUSP00000001812', '10090.ENSMUSP00000002708', '10090.ENSMUSP00000021921',
     '10090.ENSMUSP00000025791', '10090.ENSMUSP00000026474', '10090.ENSMUSP00000030443',
     '10090.ENSMUSP00000054837', '10090.ENSMUSP00000084430', '10090.ENSMUSP00000099623',
     '10090.ENSMUSP00000106137', '10090.ENSMUSP00000107498']
fg = "%0d".join(ENSPs)
result = requests.post(url,
                   params={"output_format": "json",
                           "enrichment_method": "compare_samples",
                           "limit_2_entity_type": "-56;-51;-57",
                   # comma separate desired entity_types (e.g. PMIDs, UniProt Keywords, and Reactome).
                           "p_value_cutoff": 1,
                           "FDR_cutoff": 1,
                           "taxid": 10090},
                   data={"foreground": fg,
                         "background": fg})
# since the foreground equals the background completely no significant enrichment will result from this analysis
# it is meant to show how to show how to use the method call, and to apply a sanity check in case you get no results (try using p value or FDR cutoff of 1)
result_json = result.json()

Example #4

Characterize foreground, get functional information

ENSPs = ["9606.ENSP00000266970"]
fg = "%0d".join(ENSPs)
result = requests.post(url,
                   params={"output_format": "tsv",
                           "filter_foreground_count_one": False,
                           "enrichment_method": "characterize_foreground"},
                   data={"foreground": fg})
df = pd.read_csv(StringIO(result.text), sep='\t')

API parameters

These parameters reflect the parameters of the graphical user interface of the web page, but some have slightly varying names. They are briefly described below, please see the parameters page for more details.
  • taxid: In order to know which organism's reference proteome to use as the background you're expected to provide the NCBI taxon identifier (e.g. 9606 for Homo sapiens). Please make sure to use the exact TaxID UniProt provides, some cases are can be unexpected e.g. instead of Taxid '4932' for 'Saccharomyces cerevisiae', UniProt provides '559292' for 'Saccharomyces cerevisiae S288C'. We therefore support '559292'. Default = None (not set). In case you provide a taxon of rank 'species', we'll attempt to map to a closely related UniProt reference proteome with the highest number of proteins.
  • species: deprecated, please use "taxid" instead.
  • organism: deprecated, please use "taxid" instead.
  • output_format: The desired format of the output, one of {tsv, tsv-no-header, json, xml}, default = "tvs".
  • filter_parents: Remove parent terms (keep GO terms and UniProt Keywords of lowest leaf) if they are associated with exactly the same foreground. For a more detailed description see Parameters --> Report options --> Filter redundant parent terms. Default=True.
  • filter_foreground_count_one: Keep only those terms with foreground_count > 1. Default = True.
  • filter_PMID_top_n: Filter the top n PMIDs (e.g. 100, default=100), sorting by low p value and recent publication date. Default = 100.
  • caller_identity: Your identifier for us e.g. "www.my_awesome_app.com". Default=None (this parameter exists to comply with STRING).
  • FDR_cutoff: False Discovery Rate cutoff (cutoff for multiple testing corrected p values) e.g. 0.05, default=0.05 meaning 5%. Set to 1 for no cutoff. Default = 0.05
  • limit_2_entity_type: Limit the enrichment analysis to a specific or multiple entity types, e.g. '-21' (for GO molecular function) or '-21;-22;-23;-51' (for all GO terms as well as UniProt Keywords).", default ="-20;-21;-22;-23;-51;-52;-54;-55;-56;-57;-58" which means use all available categories. See parameters page for a translation of entity types.
  • foreground: Protein identifiers for all proteins in the test group (the foreground, the sample, the group you want to examine for GO term enrichment), separate the list of Accession Number using '%0d' e.g. '4932.YAR019C%0d4932.YFR028C%0d4932.YGR092W'. Default = None.
  • background: Protein identifiers for all proteins in the background (the population, the group you want to compare your foreground to) separate the list of Accession Number using '%0d'e.g. '4932.YAR019C%0d4932.YFR028C%0d4932.YGR092W'. Default = None.
  • background_intensity: Protein abundance (intensity) for all proteins (copy number, iBAQ, or any other measure of abundance). Separate the list using '%0d'. The number of items should correspond to the number of Accession Numbers of the 'background' e.g. '12.3%0d3.4'. Default = None.
  • enrichment_method: There are 5 methods available: "abundance_correction", "compare_samples", "compare_groups", "characterize_foreground", "genome". Default = "characterize_foreground".
  • goslim: GO basic or a slim subset {generic, agr, aspergillus, candida, chembl, flybase_ribbon, metagenomics, mouse, pir, plant, pombe, synapse, yeast}. Choose between the full Gene Ontology ('basic') or one of the GO slim subsets (e.g. 'generic' or 'mouse'). GO slim is a subset of GO terms that are less fine grained. 'basic' does not exclude anything, select 'generic' for a subset of broad GO terms, the other subsets are tailor made for a specific taxons / analysis (see GO subset guide). Default = "basic".
  • o_or_u_or_both: over- or under-represented or both, one of {overrepresented, underrepresented, both}. Choose to only test and report overrepresented or underrepresented GO-terms, or to report both of them. Default = "overrepresented".
  • num_bins: The number of bins created based on the abundance values provided. Only relevant if 'Abundance correction' is selected. Default = 100.
  • p_value_cutoff: Apply a filter (value between 0 and 1) for maximum cutoff value of the uncorrected p value. '1' means nothing will be filtered, '0.01' means all uncorected p_values <= 0.01 will be removed from the results (but still tested for multiple correction). Default = 1
  • multiple_testing_per_etype: If True calculate multiple testing correction separately per entity type (functional category), in contrast to performing the correction together for all results. Default = True