Skip to content

mecfs_bio.build_system.task.get_uniprot_reference_data_task

Task to download reference data about proteins from UniProt

Classes:

Attributes:

DEFAULT_FIELDS module-attribute

DEFAULT_FIELDS = [
    "accession",
    "id",
    "gene_names",
    "protein_name",
    "organism_name",
    "organism_id",
    "go_id",
    "go_p",
    "go_c",
    "go_f",
    "cc_subcellular_location",
    "cc_function",
    "xref_ensembl",
    "xref_reactome",
]

protkb module-attribute

protkb = ProtKB()

GetUniProtReferenceDataTask

Bases: Task

Task to download reference data about proteins from UniProt

Methods:

Attributes:

deps property

deps: list[Task]

field_list class-attribute instance-attribute

field_list: list[str] = DEFAULT_FIELDS

meta property

meta: Meta

create classmethod

create(asset_id: str)
Source code in mecfs_bio/build_system/task/get_uniprot_reference_data_task.py
@classmethod
def create(cls, asset_id: str):
    meta = ReferenceFileMeta(
        group="protein_lookup",
        sub_group="uniprot",
        sub_folder=PurePath("raw"),
        id=AssetId(asset_id),
        extension=".parquet",
        read_spec=DataFrameReadSpec(DataFrameParquetFormat()),
    )
    return cls(meta)

execute

execute(scratch_dir: Path, fetch: Fetch, wf: WF) -> Asset
Source code in mecfs_bio/build_system/task/get_uniprot_reference_data_task.py
def execute(self, scratch_dir: Path, fetch: Fetch, wf: WF) -> Asset:
    query = reviewed(True) & organism_name("human")
    result = protkb.get(query, fields=self.field_list)
    out_path = scratch_dir / "uniprot.parquet"
    result.to_parquet(out_path)
    return FileAsset(out_path)