" conditions_skip = line.startswith('#') or 'tRNA' in line or 'name=' in line\n",
" conditions_skip = line.startswith('#') or 'tRNA' in line or 'name=' in line\n",
" if not conditions_skip:\n",
" if not conditions_skip:\n",
...
@@ -190,7 +183,7 @@
...
@@ -190,7 +183,7 @@
],
],
"metadata": {
"metadata": {
"kernelspec": {
"kernelspec": {
"display_name": "ascc24",
"display_name": "jupyterhub-5.1.0",
"language": "python",
"language": "python",
"name": "python3"
"name": "python3"
},
},
...
@@ -204,7 +197,7 @@
...
@@ -204,7 +197,7 @@
"name": "python",
"name": "python",
"nbconvert_exporter": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"pygments_lexer": "ipython3",
"version": "3.9.19"
"version": "3.12.4"
}
}
},
},
"nbformat": 4,
"nbformat": 4,
...
...
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
# Adding functional annotation from EggNOG-mapper
# Adding functional annotation from EggNOG-mapper
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
fromtqdmimporttqdm
# from tqdm import tqdm # install it for nice progress bars
importpandasaspd
importpandasaspd
```
```
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
### util functions
### util functions
We are going to need three helper functions:
We are going to need three helper functions:
- extract the gene ID from the `#query` field of the EggNOG-mapper output
- extract the gene ID from the `#query` field of the EggNOG-mapper output
- break up the content of the attributes field of the GFF file into a dictionary
- break up the content of the attributes field of the GFF file into a dictionary
- find the correct protein name for a gene ID
- find the correct protein name for a gene ID
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
defparse_gene_id(x):
defparse_gene_id(x):
"""Extract gene ID from a string
"""Extract gene ID from a string
Parameters
Parameters
----------
----------
x : str
x : str
A protein ID from the eggNOG-mapper output.
A protein ID from the eggNOG-mapper output.
Returns
Returns
-------
-------
str
str
will return the gene ID in the format of 'PB.X' (PacBio genes) or 'gX' (BRAKER round 1) or 'r2_gX' (BRAKER round 2) or 'at_DNX (de-novo transcriptome-assembled genes)'
will return the gene ID in the format of 'PB.X' (PacBio genes) or 'gX' (BRAKER round 1) or 'r2_gX' (BRAKER round 2) or 'at_DNX (de-novo transcriptome-assembled genes)'