Lenfest Internship- Data Scientist

| Philadelphia, PA, USA
Apply
By clicking Apply Now you agree to share your profile information with the hiring company.

The Philadelphia Inquirer is a public benefit corporation owned by the nonprofit Lenfest Institute for Journalism. Together, we're at the center of a critical mission to create a lasting future for ambitious, engaging, and useful local journalism. We're doing this, in part, by deepening our connection with the communities we serve. Our integrated digital and print platforms are the Philadelphia region's largest media network. We're passionate about building a sustainable model for indispensable local journalism, and we take pride in finding diverse, dynamic, and talented individuals to help push our team forward.

Description

The Philadelphia Inquirer in partnership with the Brown Institute at Columbia Journalism School are hiring students to assist on a project titled 'Developing machine learning applications that operationalize DEI best practices in local newsrooms'. The project includes producing

open-source methods and automated tools for news organizations to extract geographic data from news coverage for analysis and future product development, and builds upon existing work by the partnership.

The student will work under the supervision of members from the Philadelphia Inquirer and Brown Institute at Columbia Journalism School and alongside a team of other researchers. The goal of the project is to produce a well documented, open source toolset that any local newsroom with an API or access to machine-readable content can use.

The first phase of the project is focused on the data pipeline and is two-fold: (1) experiment with NLP techniques to successfully extract locations and (2) build a pipeline for training data preparation that can be audited and reviewed by non-technical members of the project.

The second phase of the project is focused on the machine learning portion of the pipeline. The goal is to iterate on the fine-tune training of BERT, as well as explore other potential ML opportunities.

The third phase of the project is focused on location extraction and building a pipeline for others to use the tool.

The final phase of the project is focused on geocoding, and taking entities identified in the second and third phases and assigning them geography provided by geocoding services, gazetteers, and other third-party data providers.

Deliverables

The student will work among a team of researchers to deliver on these project phases. The student is expected to attend weekly meetings with the team and supervisors to review progress throughout each phase.

Key deliverables include the following:

Phase one: Building a pipeline for NLP location extraction using open-source libraries with applied heuristics. This includes documenting performance of any approach taken and

recommendations for final application. The pipeline will be delivered in a Colab notebook or set of notebooks. The student (and team) will also be responsible for building a pipeline for assembling a training dataset. Actual data assembly will be performed by another team of researchers.

Phase two: The student (and team) will iterate on adjusting parameters in the fine-tune training portion of BERT and will document the accuracy of the model(s). The student (and team) will also experiment with other ML-based approaches and provide recommendations alongside the NLP recommendations developed in phase one. Output from this phase will be delivered in notes as well as a Colab notebook or set of notebooks.

Phase three: The student (and team) will work on producing a pipeline to apply the ML-model, as well as NLP-based approaches to extract location entities from any sentence or news story. Deliverables from this phase of The Project will be provided in the form of a Colab notebook or set of notebooks, including documentation of ingesting data from a variety of inputs, including file uploads and connections to an API/database.

Phase four: the student (and team) will work on enhancing the geocoding aspect of the pipeline. This includes providing additional context to the entities being passed to geocoding APIs to strengthen their return. It also includes providing inputs for new users of the tool to manipulate the location of their paper (and relevance of locations), building a database of returned locations to prevent future geocoding costs, and constructing a pipeline to connect to third-party data providers for better results (i.e. Placekey). Deliverables from this phase of The Project will be provided in the form of a Colab notebook or set of notebooks.

Resources required

All work on the project will be conducted off-site on personal equipment. Data will be provided by The Philadelphia Inquirer, as well as another team of researchers responsible for constructing the training dataset. Computational resources, including servers, storage, and video meetings will be provided by The Philadelphia Inquirer and the Brown Institute at Columbia Journalism School.

Final product

The student (and team) will deliver a folder of Colab notebooks as well as extensive documentation for each phase of the project. The student (and team) will be supervised should they be interested in producing a research paper documenting an aspect (or aspects) of the project and its output.

Pay Rate - $25.00/Hour

* We know not everyone reading this will fit exactly what we've described. We encourage everyone to apply who shares our passion for indispensable journalism and our drive to create a sustainable business model to support it. As an equal opportunity employer, The Inquirer is committed to fostering a diverse and inclusive culture, and we especially encourage members of underrepresented communities to submit an application, including women, people of color, LGBTQ people, and people with special needs*
We are an equal opportunity employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, disability status, protected veteran status, or any other characteristic protected by law.

Other details

  • Pay Type Hourly
  • Job Start Date Monday, August 16, 2021


Apply Now

More Information on The Philadelphia Inquirer
The Philadelphia Inquirer operates in the Digital Media industry. The company is located in Philadelphia, PA. It has 1001 total employees. It offers perks and benefits such as Health insurance, Paid sick days. To see all 27 open jobs at The Philadelphia Inquirer, click here.
Read Full Job Description
Apply Now
By clicking Apply Now you agree to share your profile information with the hiring company.

Similar Jobs

Apply Now
By clicking Apply Now you agree to share your profile information with the hiring company.
Learn more about The Philadelphia InquirerFind similar jobs