" line = f'{line}name={name} isoform {isoform};gene_name={name}'\n",
" line = f'{line};name={name} isoform {isoform};gene_name={name}'\n",
" if feature_type == 'CDS' or feature_type == 'exon':\n",
" if feature_type == 'CDS' or feature_type == 'exon':\n",
" line = f'{line}gene_name={name}'\n",
" line = f'{line};gene_name={name}'\n",
" named.write(line + '\\n')"
" named.write(line + '\\n')"
]
]
}
}
...
...
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
# Adding functional annotation from EggNOG-mapper
# Adding functional annotation from EggNOG-mapper
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
# from tqdm import tqdm # install it for nice progress bars
# from tqdm import tqdm # install it for nice progress bars
importpandasaspd
importpandasaspd
```
```
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
### util functions
### util functions
We are going to need three helper functions:
We are going to need three helper functions:
- extract the gene ID from the `#query` field of the EggNOG-mapper output
- extract the gene ID from the `#query` field of the EggNOG-mapper output
- break up the content of the attributes field of the GFF file into a dictionary
- break up the content of the attributes field of the GFF file into a dictionary
- find the correct protein name for a gene ID
- find the correct protein name for a gene ID
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
defparse_gene_id(x):
defparse_gene_id(x):
"""Extract gene ID from a string
"""Extract gene ID from a string
Parameters
Parameters
----------
----------
x : str
x : str
A protein ID from the eggNOG-mapper output.
A protein ID from the eggNOG-mapper output.
Returns
Returns
-------
-------
str
str
will return the gene ID in the format of 'PB.X' (PacBio genes) or 'gX' (BRAKER round 1) or 'r2_gX' (BRAKER round 2) or 'at_DNX (de-novo transcriptome-assembled genes)'
will return the gene ID in the format of 'PB.X' (PacBio genes) or 'gX' (BRAKER round 1) or 'r2_gX' (BRAKER round 2) or 'at_DNX (de-novo transcriptome-assembled genes)'