Web Scraping Engineer

Posted 12 Days Ago
Hiring Remotely in USA
Remote
Mid level
Fintech
The Role
The Web Scraping Engineer will design and implement large-scale crawling systems and maintain the data acquisition infrastructure, including building new crawlers and ensuring data quality. Responsibilities include collaborating with teams to understand requirements, monitoring crawler health, and managing data pipelines.
Summary Generated by Built In
Intro description:

Legalist is an institutional alternative asset management firm. Founded in 2016 and incubated at Y Combinator, the firm uses data-driven technology to invest in credit assets at scale. We are always looking for talented people to join our team.

Where You Come In:

  • Help to design and implement the architecture of a large-scale crawling system
  • Design, implement, and maintain various components of our data acquisition infrastructure (building new crawlers, maintain existing crawlers, data cleaners & loaders)
  • Work on developing tools to facilitate the scraping at scale, monitor the health of crawlers and ensure data quality of the scraped items. 
  • Collaborate with our product and business teams to understand / anticipate requirements to strive for greater functionality and impact in our data gathering systems 

What you’ll be bringing to the team:

  • 3+ Years experience with Python for data wrangling and cleaning
  • 2+ Years experience with data crawling & scraping at scale (100+ spiders at least)
  • Productionized experience with Scrapy is mandatory. Distributed crawling and advanced scrapy experience are a plus.
  • Familiarity with scraping libraries and monitoring tools highly recommended (BeautifulSoup, Xpaths, Selenium, Puppeteer, Splash)
  • Familiarity with data pipelining to integrate scraped items into existing data pipelines.
  • Experience extracting data from multiple disparate sources including HTML, XML, REST, GraphQL, PDF, and spreadsheets.
  • Experience running, monitoring and maintaining a large set of broad crawlers (100+ spiders) 
  • Sound Knowledge in bypassing Bot Detection Techniques
  • Experience using techniques to protect web scrapers against site ban, IP leak, browser crash, CAPTCHA and proxy failure.
  • Experience with cloud environments like GCP, AWS, as well as containerization tools like Docker and orchestration such as kubernetes or others.
  • Ability to maintain all aspects of a scraping pipeline end to end (building and maintaining spiers, avoiding bot prevention techniques, data cleaning and pipelining, monitoring spider health and performance).
  • OOP, SQL and Django ORM basics

Even better if you have, but not necessary:

  • Experience with microservices architecture would be a plus.
  • Familiarity with message brokers such as Kafka, RabbitMQ, etc
  • Experience with DevOps
  • Expertise in data warehouse maintenance, specifically with Google BigQuery (ETLs, data sourcing, modeling, cleansing, documentation, and maintenance)
  • Familiarity with job scheduling & orchestration frameworks - e.g. Jenkins, Dagster, Prefect

Top Skills

Python
The Company
87 Employees
Remote Workplace
Year Founded: 2016

What We Do

Legalist is an institutional alternative asset management firm. The firm uses data-driven technology to invest in credit assets at scale. Legalist combines proprietary sourcing technology with rigorous underwriting expertise to generate risk-adjusted returns for its clients.

Similar Jobs

Remote
United States of America
17243 Employees

Vannevar Labs Logo Vannevar Labs

Manager, Software Engineering (Decrypt)

Artificial Intelligence • Machine Learning • Software • Defense
Remote
USA
130 Employees

Block Logo Block

iOS Engineer, Commerce Merchant Experience

Blockchain • eCommerce • Fintech • Payments • Software • Financial Services • Cryptocurrency
Remote
Hybrid
Seattle, WA, USA
12000 Employees
139K-245K Annually

Starburst Logo Starburst

Manager, Engineering - Data Governance

Big Data • Cloud • Information Technology • Software • Database • Analytics • Big Data Analytics
Easy Apply
Remote
Boston, MA, USA
481 Employees
230K-250K Annually

Similar Companies Hiring

EDGE Thumbnail
Software • Fintech • Financial Services • Analytics
Chicago, IL
20 Employees
Bectran, Inc Thumbnail
Software • Machine Learning • Information Technology • Fintech • Automation • Artificial Intelligence
Schaumburg, IL
51 Employees
MassMutual India Thumbnail
Insurance • Information Technology • Fintech • Financial Services • Big Data
Hyderabad, Telangana

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account