Senior Web Scraping Engineer — Labrynth Job New York New York USA,IT/Tech

Location: New York

Stack @ Labrynth: GCP
· Python
· Pydantic/PydanticAI
· Docling
· Django
· Cloud Run
· LLMs
· Git Hub
· Clickup
· Selenium

About Labrynth:

At Labrynth
, we’re a Silicon Valley startup building next-generation Hermeneutical-Agent systems — AI that can read, reason, and execute on the world’s most complex regulations.
Our Application Validator is live, performing audit-grade, evidence-grounded compliance checks. Next, we’re expanding the Application Generator to create regulator-ready drafts backed by verified data and citations.

You’ll help shape both — advancing safety
, evaluation rigor
, latency
, and cost-efficiency across large-scale, production AI systems at the edge of applied research and real-world impact.

Our mission is to transform bureaucratic and complex processes using AI and automation, turning them into fast, transparent, and scalable pipelines. We are a spin-off from the world’s largest AI Model Trainer
- Invisible Technologies and are backed by the Infinity Constellation group. We already work with enterprise clients, governments, and large-scale projects, so you will have a real impact accelerating major developments.

About the Role

We are looking for a Senior Web Scraping Engineer to design, build, and operate large-scale data collection systems. You will be responsible for developing robust scrapers using tools such as Selenium, Beautiful Soup, Playwright, Scrapy
, etc., and for creating automated workflows in the cloud that run reliably on a schedule, generate logs, and surface failures proactively.

You will also experiment with and apply LLM-based techniques to improve scraping robustness and data extraction quality.

Key Responsibilities

Design, implement, and maintain web scraping pipelines for a wide variety of websites and data sources.
Build scrapers using tools and frameworks such as Selenium, Playwright, Beautiful Soup, Scrapy (and similar libraries) with a focus on reliability, performance, and maintainability.
Create automated workflows for scraping and data processing:
- Containerize scraping jobs (e.g., using Docker).
- Deploy and orchestrate them in the cloud (e.g., AWS, GCP, Azure).
- Configure scheduling (e.g., run daily/weekly/hourly) and dependency management.
Implement monitoring, alerting, and logging:
- Capture detailed logs for each job run.
- Track job statuses and failures.
- Implement notifications/alerts when a scraper breaks or a website changes.
Handle anti-bot measures (proxies, captchas, rate limits) and design scrapers that are resilient to layout and structure changes.
Work closely with data engineering / product / ML teams to understand data requirements and ensure data quality.
Utilize LLMs (Large Language Models) to:
- Parse and extract structured information from messy HTML or semi-structured content.
- Increase robustness of scrapers to frequent UI/DOM changes.
- Prototype new scraping / extraction strategies using LLM APIs.
Write clean, well-tested, and well-documented code, and contribute to best practices, code reviews, and tooling for the team.
Continuously improve the scraping platform, including performance optimizations, standardization, and reusability of components.

Requirements

3+ years of professional experience working with web scraping or data collection at scale.
Strong proficiency in Python and common scraping libraries/frameworks such as:
- Selenium, Playwright, Beautiful Soup, Scrapy (or similar).
Solid understanding of HTML, CSS, JavaScript, HTTP, and browser behavior.
Experience building automated, production-grade workflows:
- Orchestrators / schedulers (e.g., Airflow, Prefect, Dagster, or similar).
- Building ETL/ELT pipelines and integrating with databases, data warehouses, or storage (e.g., Postgre
  
  SQL, Big Query, S3, GCS).
Hands‑on experience with cloud platforms (AWS, GCP, or Azure), including:
- Deploying and running scheduled jobs.
- Managing infrastructure-as-code or similar deployment processes.
Strong experience with logging, monitoring, and alerting:
- Ability to design logging for scraping jobs and to debug failures from logs.
- Familiarity with tools like Cloud Watch, Stackdriver, ELK, Prometheus, Grafana, or similar.
Experience with containers (Docker) and familiarity with CI/CD…


Increase/decrease your Search Radius (miles)



Job Posting Language