Web Scraping Engineer

Sorry, this job was removed at 05:34 p.m. (CST) on Wednesday, Jan 15, 2025
Hiring Remotely in USA
Remote
Fintech
The Role
Intro description:

Legalist is an institutional alternative asset management firm. Founded in 2016 and incubated at Y Combinator, the firm uses data-driven technology to invest in credit assets at scale. We are always looking for talented people to join our team.

Where You Come In:

  • Help to design and implement the architecture of a large-scale crawling system
  • Design, implement, and maintain various components of our data acquisition infrastructure (building new crawlers, maintain existing crawlers, data cleaners & loaders)
  • Work on developing tools to facilitate the scraping at scale, monitor the health of crawlers and ensure data quality of the scraped items. 
  • Collaborate with our product and business teams to understand / anticipate requirements to strive for greater functionality and impact in our data gathering systems 

What you’ll be bringing to the team:

  • 3+ Years experience with Python for data wrangling and cleaning
  • 2+ Years experience with data crawling & scraping at scale (100+ spiders at least)
  • Productionized experience with Scrapy is mandatory. Distributed crawling and advanced scrapy experience are a plus.
  • Familiarity with scraping libraries and monitoring tools highly recommended (BeautifulSoup, Xpaths, Selenium, Puppeteer, Splash)
  • Familiarity with data pipelining to integrate scraped items into existing data pipelines.
  • Experience extracting data from multiple disparate sources including HTML, XML, REST, GraphQL, PDF, and spreadsheets.
  • Experience running, monitoring and maintaining a large set of broad crawlers (100+ spiders) 
  • Sound Knowledge in bypassing Bot Detection Techniques
  • Experience using techniques to protect web scrapers against site ban, IP leak, browser crash, CAPTCHA and proxy failure.
  • Experience with cloud environments like GCP, AWS, as well as containerization tools like Docker and orchestration such as kubernetes or others.
  • Ability to maintain all aspects of a scraping pipeline end to end (building and maintaining spiers, avoiding bot prevention techniques, data cleaning and pipelining, monitoring spider health and performance).
  • OOP, SQL and Django ORM basics

Even better if you have, but not necessary:

  • Experience with microservices architecture would be a plus.
  • Familiarity with message brokers such as Kafka, RabbitMQ, etc
  • Experience with DevOps
  • Expertise in data warehouse maintenance, specifically with Google BigQuery (ETLs, data sourcing, modeling, cleansing, documentation, and maintenance)
  • Familiarity with job scheduling & orchestration frameworks - e.g. Jenkins, Dagster, Prefect

Similar Jobs

SearchApi Logo SearchApi

Ruby Engineer - Web Scraping (Remote)

Information Technology • Software
In-Office or Remote
29 Locations
5 Employees

Applied Systems Logo Applied Systems

Director, Customer Enablement & Learning

Cloud • Insurance • Payments • Software • Business Intelligence • App development • Big Data Analytics
Remote or Hybrid
3 Locations
3040 Employees
140K-175K Annually

Applied Systems Logo Applied Systems

Associate Product Manager

Cloud • Insurance • Payments • Software • Business Intelligence • App development • Big Data Analytics
Remote or Hybrid
2 Locations
3040 Employees
85K-100K Annually

Circle (circle.so) Logo Circle (circle.so)

Copywriter

Artificial Intelligence • Consumer Web • Digital Media • Information Technology • Social Impact • Software
Easy Apply
Remote
31 Locations
250 Employees
120K-130K Annually
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
87 Employees
Year Founded: 2016

What We Do

Legalist is an institutional alternative asset management firm. The firm uses data-driven technology to invest in credit assets at scale. Legalist combines proprietary sourcing technology with rigorous underwriting expertise to generate risk-adjusted returns for its clients.

Similar Companies Hiring

Rain Thumbnail
Blockchain • Fintech • Payments • Financial Services • Cryptocurrency • Web3 • Infrastructure as a Service (IaaS)
New York, NY
100 Employees
Scotch Thumbnail
Artificial Intelligence • eCommerce • Fintech • Payments • Retail • Software • Analytics
US
35 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account