A Node.js-based SoundCloud scraper designed for collecting audio data to train AI models. Includes three specialized scripts for music generation, audio enhancement, and speech processing use cases.
- Playlist scraping. Extract metadata from SoundCloud playlists including artist names, track titles, play counts, and URLs.
- Search results scraping. Find Creative Commons licensed tracks with download availability filtering.
- Profile scraping. Collect podcast episodes and long-form spoken content from user profiles.
- Proxy support. Built-in residential proxy integration for reliable, undetected scraping.
- Headless browser automation. Uses Puppeteer to handle JavaScript-heavy pages and dynamic content.
- Download detection. Automatically identifies tracks with enabled download buttons.
git clone https://github.com/Decodo/soundcloud-scraper.git
cd soundcloud-scraper
npm installGet your proxy credentials from the Decodo dashboard and update them in each script file:
await page.authenticate({
username: 'YOUR_PROXY_USERNAME', // Replace with your username
password: 'YOUR_PROXY_PASSWORD' // Replace with your password
});Train models like Suno AI, AIVA, or Stable Audio using curated playlists that represent successful musical patterns across different genres.
File: music-generation.js
Scrape trending playlists to collect metadata for training music generation models.
node music-generation.jsWhat it does:
- Targets SoundCloud playlist pages
- Extracts artist names, track titles, play counts
- Outputs structured data with rankings
Customize the target:
// Edit line 43 in music-generation.js
await page.goto('https://soundcloud.com/YOUR-PLAYLIST-URL', {Output example:
Found 50 playlist tracks
1. Artist Name - "Track Title" (1.2M plays)
https://soundcloud.com/artist/track
Collect Creative Commons tracks to train models that clean degraded recordings, remove noise, and restore audio quality like Adobe Enhance Speech or Descript.
File: audio-enhancement.js
Find Creative Commons tracks with download availability for audio cleanup model training.
node audio-enhancement.jsWhat it does:
- Searches SoundCloud with custom queries
- Auto-scrolls to load more results
- Filters tracks with download buttons enabled
Customize the search:
// Edit line 49 in audio-enhancement.js
await page.goto('https://soundcloud.com/search/sounds?q=YOUR-SEARCH-QUERY', {Output example:
Total items found: 85
Found 42 downloadable tracks:
1. Artist Name - "Track Title"
https://soundcloud.com/artist/track
Extract podcast episodes to train speech recognition, voice cloning, and natural language processing models like Whisper (OpenAI) or ElevenLabs.
File: voice-training.js
Extract podcast episodes and lectures for speech recognition and voice AI training.
node voice-training.jsWhat it does:
- Scrapes user profile pages
- Focuses on long-form spoken content
- Identifies downloadable episodes
Customize the target:
// Edit line 43 in voice-training.js
await page.goto('https://soundcloud.com/YOUR-PROFILE/tracks', {Configure limits:
// Edit lines 62-63 in speech-training.js
const maxTracksToScrape = 50; // Maximum tracks to process
const maxScrollAttempts = 20; // Scroll depthOutput example:
Found 68 total tracks
24 downloadable tracks:
1. Podcast Name - "Episode Title"
https://soundcloud.com/podcast/episode
Each script includes configurable parameters at the top of the file:
Proxy settings:
'--proxy-server=http://gate.decodo.com:7000' // Proxy endpoint
username: 'YOUR_PROXY_USERNAME' // Your credentials
password: 'YOUR_PROXY_PASSWORD'Scraping behavior:
headless: true // Run browser invisibly
timeout: 45000 // Page load timeout (ms)
maxScrollAttempts: 5 // Pagination depth
targetResults: 500 // Result limitResource blocking:
// Scripts automatically block images, media, fonts
// to improve performance and reduce bandwidth- Start with small limits (10-20 items) to test your setup
- Use residential proxies to avoid IP bans
- Add delays between requests to respect rate limits
- Monitor console output for errors and warnings
- Store credentials securely, never commit them to Git
Get residential proxies from Decodo:
- Sign up at dashboard.decodo.com
- Navigate to Residential Proxies → Proxy Setup
- Copy your Username and Password
- Update credentials in each script file
Browser won't launch?
- Verify Node.js 14+ is installed
- Install Puppeteer:
npm install - Try setting
headless: falseto see browser
Getting blocked?
- Check proxy credentials are correct
- Increase delays between requests
- Verify proxy quota on Decodo dashboard
No results found?
- SoundCloud's HTML may have changed
- Check target URL is accessible
- Review browser console output
