SoundCloud Scraper for AI Training

A Node.js-based SoundCloud scraper designed for collecting audio data to train AI models. Includes three specialized scripts for music generation, audio enhancement, and speech processing use cases.

Features

Playlist scraping. Extract metadata from SoundCloud playlists including artist names, track titles, play counts, and URLs.
Search results scraping. Find Creative Commons licensed tracks with download availability filtering.
Profile scraping. Collect podcast episodes and long-form spoken content from user profiles.
Proxy support. Built-in residential proxy integration for reliable, undetected scraping.
Headless browser automation. Uses Puppeteer to handle JavaScript-heavy pages and dynamic content.
Download detection. Automatically identifies tracks with enabled download buttons.

Installation

Prerequisites

Setup

git clone https://github.com/Decodo/soundcloud-scraper.git
cd soundcloud-scraper
npm install

Configure proxies

Get your proxy credentials from the Decodo dashboard and update them in each script file:

await page.authenticate({
  username: 'YOUR_PROXY_USERNAME',  // Replace with your username
  password: 'YOUR_PROXY_PASSWORD'   // Replace with your password
});

Usage

1. Music generation AI training

Train models like Suno AI, AIVA, or Stable Audio using curated playlists that represent successful musical patterns across different genres.

File: music-generation.js

Scrape trending playlists to collect metadata for training music generation models.

node music-generation.js

What it does:

Targets SoundCloud playlist pages
Extracts artist names, track titles, play counts
Outputs structured data with rankings

Customize the target:

// Edit line 43 in music-generation.js
await page.goto('https://soundcloud.com/YOUR-PLAYLIST-URL', {

Output example:

Found 50 playlist tracks

1. Artist Name - "Track Title" (1.2M plays)
   https://soundcloud.com/artist/track

2. Audio enhancement AI training

Collect Creative Commons tracks to train models that clean degraded recordings, remove noise, and restore audio quality like Adobe Enhance Speech or Descript.

File: audio-enhancement.js

Find Creative Commons tracks with download availability for audio cleanup model training.

node audio-enhancement.js

What it does:

Searches SoundCloud with custom queries
Auto-scrolls to load more results
Filters tracks with download buttons enabled

Customize the search:

// Edit line 49 in audio-enhancement.js
await page.goto('https://soundcloud.com/search/sounds?q=YOUR-SEARCH-QUERY', {

Output example:

Total items found: 85
Found 42 downloadable tracks:

1. Artist Name - "Track Title"
   https://soundcloud.com/artist/track

3. Speech/voice AI training

Extract podcast episodes to train speech recognition, voice cloning, and natural language processing models like Whisper (OpenAI) or ElevenLabs.

File: voice-training.js

Extract podcast episodes and lectures for speech recognition and voice AI training.

node voice-training.js

What it does:

Scrapes user profile pages
Focuses on long-form spoken content
Identifies downloadable episodes

Customize the target:

// Edit line 43 in voice-training.js
await page.goto('https://soundcloud.com/YOUR-PROFILE/tracks', {

Configure limits:

// Edit lines 62-63 in speech-training.js
const maxTracksToScrape = 50;  // Maximum tracks to process
const maxScrollAttempts = 20;  // Scroll depth

Output example:

Found 68 total tracks
24 downloadable tracks:

1. Podcast Name - "Episode Title"
   https://soundcloud.com/podcast/episode

Configuration

Each script includes configurable parameters at the top of the file:

Proxy settings:

'--proxy-server=http://gate.decodo.com:7000'  // Proxy endpoint
username: 'YOUR_PROXY_USERNAME'               // Your credentials
password: 'YOUR_PROXY_PASSWORD'

Scraping behavior:

headless: true          // Run browser invisibly
timeout: 45000          // Page load timeout (ms)
maxScrollAttempts: 5    // Pagination depth
targetResults: 500      // Result limit

Resource blocking:

// Scripts automatically block images, media, fonts
// to improve performance and reduce bandwidth

Best practices

Start with small limits (10-20 items) to test your setup
Use residential proxies to avoid IP bans
Add delays between requests to respect rate limits
Monitor console output for errors and warnings
Store credentials securely, never commit them to Git

Proxy setup

Get residential proxies from Decodo:

Sign up at dashboard.decodo.com
Navigate to Residential Proxies → Proxy Setup
Copy your Username and Password
Update credentials in each script file

Troubleshooting

Browser won't launch?

Verify Node.js 14+ is installed
Install Puppeteer: npm install
Try setting headless: false to see browser

Getting blocked?

Check proxy credentials are correct
Increase delays between requests
Verify proxy quota on Decodo dashboard

No results found?

SoundCloud's HTML may have changed
Check target URL is accessible
Review browser console output

Documentation

Related projects

🗺️ Google Maps Scraper

🔍 Google Lens Scraper

📰 Google News Scraper

💬 Reddit Scraper

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
1-music-generation.js		1-music-generation.js
2-audio-enhancement.js		2-audio-enhancement.js
3-voice-training.js		3-voice-training.js
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SoundCloud Scraper for AI Training

Features

Installation

Prerequisites

Setup

Configure proxies

Usage

1. Music generation AI training

2. Audio enhancement AI training

3. Speech/voice AI training

Configuration

Best practices

Proxy setup

Troubleshooting

Documentation

Related projects

About

Uh oh!

Releases

Packages

Languages

License

Decodo/soundcloud-scraper

Folders and files

Latest commit

History

Repository files navigation

SoundCloud Scraper for AI Training

Features

Installation

Prerequisites

Setup

Configure proxies

Usage

1. Music generation AI training

2. Audio enhancement AI training

3. Speech/voice AI training

Configuration

Best practices

Proxy setup

Troubleshooting

Documentation

Related projects

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages