×
Register Here to Apply for Jobs or Post Jobs. X

Senior Web Scraping Engineer — Labrynth

Remote / Online - Candidates ideally in
New York, New York County, New York, 10261, USA
Listing for: Infinity
Remote/Work from Home position
Listed on 2026-01-10
Job specializations:
  • IT/Tech
    Data Engineer, Cloud Computing
Salary/Wage Range or Industry Benchmark: 5000 USD Monthly USD 5000.00 MONTH
Job Description & How to Apply Below
Location: New York

Stack @ Labrynth: GCP
· Python
· Pydantic/PydanticAI
· Docling
· Django
· Cloud Run
· LLMs
· Git Hub
· Clickup
· Selenium

About Labrynth:

At Labrynth
, we’re a Silicon Valley startup building next-generation Hermeneutical-Agent systems — AI that can read, reason, and execute on the world’s most complex regulations.
Our Application Validator is live, performing audit-grade, evidence-grounded compliance checks. Next, we’re expanding the Application Generator to create regulator-ready drafts backed by verified data and citations.

You’ll help shape both — advancing safety
, evaluation rigor
, latency
, and cost-efficiency across large-scale, production AI systems at the edge of applied research and real-world impact.

Our mission is to transform bureaucratic and complex processes using AI and automation, turning them into fast, transparent, and scalable pipelines. We are a spin-off from the world’s largest AI Model Trainer
- Invisible Technologies and are backed by the Infinity Constellation group. We already work with enterprise clients, governments, and large-scale projects, so you will have a real impact accelerating major developments.

About the Role

We are looking for a Senior Web Scraping Engineer to design, build, and operate large-scale data collection systems. You will be responsible for developing robust scrapers using tools such as Selenium, Beautiful Soup, Playwright, Scrapy
, etc., and for creating automated workflows in the cloud that run reliably on a schedule, generate logs, and surface failures proactively.

You will also experiment with and apply LLM-based techniques to improve scraping robustness and data extraction quality.

Key Responsibilities
  • Design, implement, and maintain web scraping pipelines for a wide variety of websites and data sources.

  • Build scrapers using tools and frameworks such as Selenium, Playwright, Beautiful Soup, Scrapy (and similar libraries) with a focus on reliability, performance, and maintainability.

  • Create automated workflows for scraping and data processing:

    • Containerize scraping jobs (e.g., using Docker).

    • Deploy and orchestrate them in the cloud (e.g., AWS, GCP, Azure).

    • Configure scheduling (e.g., run daily/weekly/hourly) and dependency management.

  • Implement monitoring, alerting, and logging:

    • Capture detailed logs for each job run.

    • Track job statuses and failures.

    • Implement notifications/alerts when a scraper breaks or a website changes.

  • Handle anti-bot measures (proxies, captchas, rate limits) and design scrapers that are resilient to layout and structure changes.

  • Work closely with data engineering / product / ML teams to understand data requirements and ensure data quality.

  • Utilize LLMs (Large Language Models) to:

    • Parse and extract structured information from messy HTML or semi-structured content.

    • Increase robustness of scrapers to frequent UI/DOM changes.

    • Prototype new scraping / extraction strategies using LLM APIs.

  • Write clean, well-tested, and well-documented code, and contribute to best practices, code reviews, and tooling for the team.

  • Continuously improve the scraping platform, including performance optimizations, standardization, and reusability of components.

Requirements
  • 3+ years of professional experience working with web scraping or data collection at scale.

  • Strong proficiency in Python and common scraping libraries/frameworks such as:

    • Selenium, Playwright, Beautiful Soup, Scrapy (or similar).

  • Solid understanding of HTML, CSS, JavaScript, HTTP, and browser behavior.

  • Experience building automated, production-grade workflows:

    • Orchestrators / schedulers (e.g., Airflow, Prefect, Dagster, or similar).

    • Building ETL/ELT pipelines and integrating with databases, data warehouses, or storage (e.g., Postgre

      SQL, Big Query, S3, GCS).

  • Hands‑on experience with cloud platforms (AWS, GCP, or Azure), including:

    • Deploying and running scheduled jobs.

    • Managing infrastructure-as-code or similar deployment processes.

  • Strong experience with logging, monitoring, and alerting:

    • Ability to design logging for scraping jobs and to debug failures from logs.

    • Familiarity with tools like Cloud Watch, Stackdriver, ELK, Prometheus, Grafana, or similar.

  • Experience with containers (Docker) and familiarity with CI/CD…

Position Requirements
10+ Years work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary