mecfs_bio.build_system.task.fetch_gget_info_task
Task to use gget to annotate gene lists with annotations from genetics databases.
Classes:
-
FetchGGetInfoTask–Task to use gget (https://github.com/pachterlab/gget) to retrieve database information about a list of genes from a dataframe
Attributes:
-
PRIMARY_GENE_NAME– -
PROTEIN_NAMES_COL– -
SUBCELLULAR_LOCALISATION_COL– -
UNIPROT_DESCRIPTION– -
UNIPROT_ID_COL– -
logger–
SUBCELLULAR_LOCALISATION_COL
module-attribute
FetchGGetInfoTask
Bases: Task
Task to use gget (https://github.com/pachterlab/gget) to retrieve database information about a list of genes from a dataframe Useful for analyzing GWAS results.
Listen to an interview with the primary developer of gget here: https://podcasts.apple.com/nz/podcast/99-laura-luebbert-gget-hunting-viruses-and/id1534473511?i=1000664104787
Sometimes gget returns dataframes with inconsistent formatting. e.g.: some columns are partly lists, and partly singleton values. Thus this file also contains functionality to munge the output of gget into a more consistent format.
Methods:
Attributes:
-
deps(list[Task]) – -
ensembl_id_col(str) – -
genes_to_use(int | None) – -
meta(Meta) – -
out_format(OutFormat) – -
post_pipe(DataProcessingPipe) – -
source_df_task(Task) – -
source_id(AssetId) – -
source_meta(Meta) –
create
classmethod
create(
asset_id: str,
source_df_task: Task,
ensembl_id_col: str,
genes_to_use: int | None = None,
post_pipe: DataProcessingPipe = IdentityPipe(),
out_format: OutFormat = CSVOutFormat(","),
)