In advanced process nodes, the severe decoupling between SRAM scaling stagnation and logic circuit scaling, combined with the surging on-chip memory demands from Large Language Model (LLM) training ...