For power users who prefer command-line tools like wget or curl , or those using download managers like aria2c , you can grab the files directly from the hosting source (usually The Eye or a similar archival host).
zstd -d pubmed_central.jsonl.zst
from pile import Pile
Alternatively, download the .torrent file from the-eye.eu or huggingface.co/datasets/EleutherAI/the_pile . how to download the pile dataset