In a recent study published in the journal Nucleic Acids Research , researchers investigate whether machine learning can identify pan-cancer mutational hotspots at persistent CCCTC-binding factor (P-CTCF) binding sites (P-CTCFBSs). Study: Machine learning enables pan-cancer identification of mutational hotspots at persistent CTCF binding sites . Image Credit: Nuttapong punna / Shutterstock.

com CTCF-binding site mutations impact CTCF, a transcription- and nuclear architecture-regulating protein in non-coding deoxyribonucleic acid (DNA). Constant CTCF-BSs show resilience to CTCF knockdown and conservation of binding. These subtypes are distinguished by their higher binding strength, specific constitutive binding, chromatin loop anchor enrichment, and topologically associating domain (TAD) boundaries.

Mutations in the CTCF binding site can activate oncogenic genes; however, few of these mutations have been identified. In the present study, researchers developed CTCF-In-Silico Investigation of PersisTEnt Binding (INSITE), a computational tool capable of predicting the persistence of CTCF binding following knockdown in cancer cells. CTCF-INSITE is a machine learning tool that assesses both genetic and epigenetic characteristics accounting for the persistence of CTCF binding.

The mutational load at PCTCF binding sites was determined using International Cancer Genome Consortium (ICGC) sequences from matched tumors by generating persistence metrics for the Encyclopedia of DNA Element.