Infinity Constellation

Amplifying founders and building companies with exponential potential, founded by Invisible with a focus on AI services

Senior Web Scraping Engineer

EngineerEngineerFull TimeRemoteTeam 1-10Since 2023H1B No SponsorCompany SiteLinkedIn

Location

United States

Posted

103 days ago

Salary

Not specified

3 yrs expEnglishAirflowAWSAzureBig QueryCloudDockerETLGoogle Cloud PlatformGrafanaJava ScriptPostgresPrometheusPythonSelenium

Job Description

• Design, implement, and maintain web scraping pipelines for a wide variety of websites and data sources. • Build scrapers using tools and frameworks such as Selenium, Playwright, BeautifulSoup, Scrapy (and similar libraries) with a focus on reliability, performance, and maintainability. • Create automated workflows for scraping and data processing: • Containerize scraping jobs (e.g., using Docker). • Deploy and orchestrate them in the cloud (e.g., AWS, GCP, Azure). • Configure scheduling (e.g., run daily/weekly/hourly) and dependency management. • Implement monitoring, alerting, and logging: • Capture detailed logs for each job run. • Track job statuses and failures. • Implement notifications/alerts when a scraper breaks or a website changes. • Handle anti-bot measures (proxies, captchas, rate limits) and design scrapers that are resilient to layout and structure changes. • Work closely with data engineering / product / ML teams to understand data requirements and ensure data quality. • Utilize LLMs (Large Language Models) to: • Parse and extract structured information from messy HTML or semi-structured content. • Increase robustness of scrapers to frequent UI/DOM changes. • Prototype new scraping / extraction strategies using LLM APIs. • Write clean, well-tested, and well-documented code, and contribute to best practices, code reviews, and tooling for the team. • Continuously improve the scraping platform, including performance optimizations, standardization, and reusability of components.

Job Requirements

  • 3+ years of professional experience working with web scraping or data collection at scale.
  • Strong proficiency in Python and common scraping libraries/frameworks such as: Selenium, Playwright, BeautifulSoup, Scrapy (or similar).
  • Solid understanding of HTML, CSS, JavaScript, HTTP, and browser behavior.
  • Experience building automated, production-grade workflows: Orchestrators / schedulers (e.g., Airflow, Prefect, Dagster, or similar).
  • Building ETL/ELT pipelines and integrating with databases, data warehouses, or storage (e.g., PostgreSQL, BigQuery, S3, GCS).
  • Hands-on experience with cloud platforms (AWS, GCP, or Azure), including: Deploying and running scheduled jobs.
  • Managing infrastructure-as-code or similar deployment processes.
  • Strong experience with logging, monitoring, and alerting: Ability to design logging for scraping jobs and to debug failures from logs.
  • Familiarity with tools like CloudWatch, Stackdriver, ELK, Prometheus, Grafana, or similar.
  • Experience with containers (Docker) and familiarity with CI/CD workflows.
  • Exposure to LLMs (e.g., OpenAI, Anthropic, etc.) for tasks like parsing, information extraction, or automation.
  • Strong problem-solving skills and the ability to debug complex, dynamic websites.
  • Comfortable working in a fast-paced environment, with good communication skills in English.

Benefits

  • Fully remote, flexible hours
  • Work on a global team, with real-world challenges
  • Payment in USD (contractor/freelance basis)

Related Categories

Related Job Pages

More Engineer Jobs

Senior Recovery and Restoration Engineer

GuidePoint Security

We help organizations make smarter cybersecurity decisions that minimize risk.

Engineer103 days ago
Full TimeRemoteTeam 201-500H1B Sponsor

Senior Recovery and Restoration Engineer rebuilding infrastructure after cyber incidents

DNSPythonVMware
United States
Full TimeRemoteTeam 1-10Since 2018H1B No Sponsor

Corporate Process Controls Engineer specializing in automation and control systems in the U.S.

Cyber Security
Texas

Senior VDI Admin/Engineer

Data Management Group

Your strategic and recruiting partner for solving data management and data governance challenges.

Engineer104 days ago
ContractRemoteTeam 11-50

Senior VDI Admin/Engineer focusing on VMware Horizon and VCE infrastructure support

AzureVMware
Texas

Consulting Build Engineer

Machina

Transforming the manufacturing sector with advanced engineering and software tools

Engineer104 days ago
Full TimeRemoteTeam 1-10Since 2021

Consulting Build Engineer optimizing build systems at Trace Machina

AnsibleAWSAzureCloudDistributed SystemsGoogle Cloud PlatformJenkinsPythonShell ScriptingTerraformGo
United States