Leverage Playwright and ISP proxies to effortlessly scrape Zillow’s real estate data without blocks or captchas.
In today’s data-driven economy, the ability to access and analyze high-quality web data can make or break strategic decisions—whether you’re monitoring competitor prices, tracking real estate market trends, or gathering property listings. However, not all data is easy to retrieve. Many websites rely on dynamic content generation, client-side rendering, and interactive elements that challenge traditional HTTP-based scraping techniques.
Playwright emerges as a powerful solution to these modern complexities. Developed by Microsoft and open-sourced for the broader community, Playwright isn’t just another browser automation library; it’s a robust, full-featured framework designed to test, interact with, and extract data from any website—no matter how sophisticated its front-end technologies may be.
At its core, Playwright is an end-to-end testing framework that enables developers to automate browser actions programmatically. It supports multiple browser engines (Chromium, Firefox, and WebKit) and offers a rich API for navigating pages, clicking buttons, filling out forms, waiting for elements to render, and capturing screenshots or PDFs.
Because Playwright can replicate a real user’s browsing experience so faithfully, it has also become an invaluable tool for web-scraping scenarios that traditional scraping methods struggle with. Instead of just parsing raw HTML, Playwright launches a full browser environment (headless or headful) and lets you:
setTimeout
calls, Playwright’s built-in waiting mechanisms let you instruct the scraper to pause until specific elements or conditions are met. This reduces flakiness and improves data quality.While web scraping is one of the standout applications of Playwright, it’s by no means the only one:
Even the most capable scraper can encounter obstacles when dealing with rate-limiting or advanced anti-bot mechanisms. High-volume requests from a single IP may raise red flags or trigger captchas. ISP IPs—such as those offered by certain proxy providers—help your traffic blend in with typical user behavior, reducing the risk of suspicion or blocking.
Key Advantages:
Some providers don’t automatically “rotate” these IPs for you. Instead, they supply multiple ISP endpoints, which you can manually rotate in your code by cycling through each proxy address.
const { chromium } = require('playwright');
/**
* A sample list of ISP proxy endpoints. Each entry has credentials and a unique endpoint.
*/
const proxyList = [
'http://username:password@proxy1.statproxies.com:3128',
'http://username:password@proxy2.statproxies.com:3128',
'http://username:password@proxy3.statproxies.com:3128'
// ... more proxies if you have them
];
(async () => {
/**
* For demonstration, we'll iterate over each proxy in the list.
* In a real-world scenario, you might choose to rotate proxies:
* - After each page load
* - After each batch of requests
* - Or based on a time-based interval
*/
for (let i = 0; i < proxyList.length; i++) {
const proxyServer = proxyList[i];
console.log(`Using proxy: ${proxyServer}`);
// Launch a new Playwright browser instance using the current proxy
const browser = await chromium.launch({
headless: true,
args: [`--proxy-server=${proxyServer}`]
});
const context = await browser.newContext();
const page = await context.newPage();
// Navigate to Zillow search results for San Francisco
await page.goto('https://www.zillow.com/homes/for_sale/San-Francisco,-CA_rb/', { waitUntil: 'networkidle' });
// Wait for listings to appear
await page.waitForSelector('.list-card-info');
// Extract property data
const listings = await page.$$eval('.list-card-info', (cards) =>
cards.map(card => {
const address = card.querySelector('.list-card-addr')?.textContent.trim();
const price = card.querySelector('.list-card-price')?.textContent.trim();
const details = card.querySelector('.list-card-details')?.textContent.trim();
const link = card.querySelector('a.list-card-link')?.href;
return { address, price, details, link };
})
);
console.log(`Found ${listings.length} listings using ${proxyServer}`);
// Print one example so we know what we got
if (listings.length > 0) {
console.log('Example listing:', listings[0]);
}
await browser.close();
}
})();
Key Steps Explained:
for
loop cycles through each endpoint, launching a separate Chromium instance. This approach distributes requests across multiple IPs rather than funneling them through just one..list-card-info
elements appear, indicating listings have loaded.With the structured data in hand, you can:
Playwright’s ability to navigate and scrape JavaScript-driven websites—combined with a set of ISP proxy endpoints—offers a reliable framework for extracting data from sites like Zillow. By manually rotating through multiple IP addresses, you can avoid common scraping roadblocks such as rate limits and captchas, giving you consistent access to the real estate market insights you need.
In an environment where information is a strategic asset, Playwright stands as a powerful ally for collecting clean, actionable datasets. Whether you’re analyzing market trends, tracking property values, or compiling the perfect dataset for an AI model, manually rotating ISP IPs gives you the control necessary to scale responsibly and effectively—no matter how protected or dynamic the site may be.