a GO tool

Application Programming Interface

aGOtool has an application programming interface (API) which enables you to get data without using the graphical user interface of the web page. The API is convenient if you need programmatic access. Please have a look at the examples.

Python 3 examples


import requests
from io import StringIO
import pandas as pd
# call api_help for help and argument defaults
response = requests.get(r"https://agotool.org/api_help")
print(response.json())
url_ = r"https://agotool.org/api"
    

Example #1 from STRING

Protein name: trpA; Organism: Escherichia coli K12_MG1655


ENSPs = ['511145.b1260', '511145.b1261', '511145.b1262', '511145.b1263', '511145.b1264', '511145.b1812',
     '511145.b2551', '511145.b3117', '511145.b3360', '511145.b3772', '511145.b4388']
fg = "%0d".join(ENSPs)
result = requests.post(url_,
                   params={"output_format": "json",
                           "enrichment_method": "genome",
                           "taxid": 83333}, # UniProt reference proteomes uses "Escherichia coli (strain K12)" with Taxid 83333 as a pan proteome instead of 511145 (TaxID on taxonomic rank of species).
                   data={"foreground": fg})
result_json = result.json()
print("Enrichment results for the following entity types was calculated: \n", result_json.keys()) # See the parameters page for more detais on entity types.
            

Example #2 from STRING

Use iPython or Jupyter notebook to interactively view results as a pandas data frame.

Protein name: CDC15; Organism: Saccharomyces cerevisiae


ENSPs = ['4932.YAR019C', '4932.YFR028C', '4932.YGR092W', '4932.YHR152W', '4932.YIL106W', '4932.YJL076W',
     '4932.YLR079W', '4932.YML064C', '4932.YMR055C', '4932.YOR373W', '4932.YPR119W']
fg = "%0d".join(ENSPs)
result = requests.post(url_,
                   params={"output_format": "tsv",
                           "enrichment_method": "genome",
                           "taxid": 559292}, # UniProt reference proteomes uses "Saccharomyces cerevisiae S288C" with Taxid 559292 as a pan proteome instead of 4932 (TaxID on taxonomic rank of species).
                   data={"foreground": fg})
result = result.text
df = pd.read_csv(StringIO(result), sep='\t')
print(df.head())
        

Example #3 from STRING

Protein name: trpA; Organism: Escherichia coli K12_MG1655


ENSPs = ['10090.ENSMUSP00000001812', '10090.ENSMUSP00000002708', '10090.ENSMUSP00000021921',
     '10090.ENSMUSP00000025791', '10090.ENSMUSP00000026474', '10090.ENSMUSP00000030443',
     '10090.ENSMUSP00000054837', '10090.ENSMUSP00000084430', '10090.ENSMUSP00000099623',
     '10090.ENSMUSP00000106137', '10090.ENSMUSP00000107498']
fg = "%0d".join(ENSPs)
result = requests.post(url_,
                   params={"output_format": "json",
                           "enrichment_method": "compare_samples",
                           "limit_2_entity_type": "-56;-51;-57", # comma separate desired entity_types (e.g. PMIDs, UniProt Keywords, and Reactome).
                           "taxid": 10090},
                   data={"foreground": fg,
                         "background": fg})
# since the foreground equals the background completely no significant enrichment will result from this analysis
# it is meant to show how to show how to use the method call, and to apply a sanity check in case you get no results (try using p value or FDR cutoff of 1)
result_json = result.json()
print(result_json.keys())
          

Example #4

Characterize foreground, get functional information


ENSPs = ["9606.ENSP00000266970"]
fg = "%0d".join(ENSPs)
result = requests.post(url_,
                   params={"output_format": "tsv",
                           "filter_foreground_count_one": False,
                           "enrichment_method": "characterize_foreground"},
                   data={"foreground": fg})
df = pd.read_csv(StringIO(result.text), sep='\t')
print(df.head())
          



API parameters

These parameters reflect the parameters of the graphical user interface of the web page, but some have slightly varying names. They are briefly described below, please see the parameters page for more details.
  • taxid: In order to know which organism's reference proteome to use as the background you're expected to provide the NCBI taxon identifier (e.g. 9606 for Homo sapiens). Please make sure to use the exact TaxID UniProt provides, some cases are can be unexpected e.g. instead of Taxid '4932' for 'Saccharomyces cerevisiae', UniProt provides '559292' for 'Saccharomyces cerevisiae S288C'. We therefore support '559292'. Default = None (not set).
  • species: deprecated, please use "taxid" instead.
  • organism: deprecated, please use "taxid" instead.
  • output_format: The desired format of the output, one of {tsv, tsv-no-header, json, xml}, default = "tvs".
  • filter_parents: Remove parent terms (keep GO terms and UniProt Keywords of lowest leaf) if they are associated with exactly the same foreground. For a more detailed description see Parameters --> Report options --> Filter redundant parent terms. Default=True.
  • filter_foreground_count_one: Keep only those terms with foreground_count > 1. Default = True.
  • filter_PMID_top_n: Filter the top n PMIDs (e.g. 100, default=100), sorting by low p value and recent publication date. Default = 100.
  • caller_identity: Your identifier for us e.g. "www.my_awesome_app.com". Default=None (this parameter exists to comply with STRING).
  • FDR_cutoff: False Discovery Rate cutoff (cutoff for multiple testing corrected p values) e.g. 0.05, default=0.05 meaning 5%. Set to 1 for no cutoff. Default = 0.05
  • limit_2_entity_type: Limit the enrichment analysis to a specific or multiple entity types, e.g. '-21' (for GO molecular function) or '-21;-22;-23;-51' (for all GO terms as well as UniProt Keywords).", default ="-20;-21;-22;-23;-51;-52;-54;-55;-56;-57;-58" which means use all available categories. See parameters page for a translation of entity types.
  • foreground: Protein identifiers for all proteins in the test group (the foreground, the sample, the group you want to examine for GO term enrichment), separate the list of Accession Number using '%0d' e.g. '4932.YAR019C%0d4932.YFR028C%0d4932.YGR092W'. Default = None.
  • background: Protein identifiers for all proteins in the background (the population, the group you want to compare your foreground to) separate the list of Accession Number using '%0d'e.g. '4932.YAR019C%0d4932.YFR028C%0d4932.YGR092W'. Default = None.
  • background_intensity: Protein abundance (intensity) for all proteins (copy number, iBAQ, or any other measure of abundance). Separate the list using '%0d'. The number of items should correspond to the number of Accession Numbers of the 'background' e.g. '12.3%0d3.4'. Default = None.
  • enrichment_method: There are 5 methods available: "abundance_correction", "compare_samples", "compare_groups", "characterize_foreground", "genome". Default = "characterize_foreground".
  • goslim: GO basic or a slim subset {generic, agr, aspergillus, candida, chembl, flybase_ribbon, metagenomics, mouse, pir, plant, pombe, synapse, yeast}. Choose between the full Gene Ontology ('basic') or one of the GO slim subsets (e.g. 'generic' or 'mouse'). GO slim is a subset of GO terms that are less fine grained. 'basic' does not exclude anything, select 'generic' for a subset of broad GO terms, the other subsets are tailor made for a specific taxons / analysis (see GO subset guide). Default = "basic".
  • o_or_u_or_both: over- or under-represented or both, one of {overrepresented, underrepresented, both}. Choose to only test and report overrepresented or underrepresented GO-terms, or to report both of them. Default = "overrepresented".
  • num_bins: The number of bins created based on the abundance values provided. Only relevant if 'Abundance correction' is selected. Default = 100.
  • p_value_cutoff: Apply a filter (value between 0 and 1) for maximum cutoff value of the uncorrected p value. '1' means nothing will be filtered, '0.01' means all uncorected p_values <= 0.01 will be removed from the results (but still tested for multiple correction). Default = 1
  • multiple_testing_per_etype: If True calculate multiple testing correction separately per entity type (functional category), in contrast to performing the correction together for all results. Default = True
  • score_cutoff: Apply a filter for the minimum cutoff value of the textmining score. This cutoff is only applied to the 'characterize_foreground' method, and does not affect p values. Default = 3.
  • foreground_replicates: Provide an integer, defines the number of samples (replicates) of the foreground. Default = 10
  • background_replicates: Analogous to foreground_replicates, defines the number of samples (replicates) of the background. Default = 10