About |
Terms, Privacy & Copyright
european digital archive

The european digital archive

spacer
spacer
Jobs
spacer

Jobs

Please send your resume and cover letter to job at internetmemory dot org. The Internet Memory Foundation thanks all applicants for their interest, but advises that only those selected for an interview will be contacted.

Development engineer internship, boilerplate detection
Development engineer internship, datasets
Development engineer internship, execution-based crawler
Development engineer internship, crawl patterns detection
Development engineer (May 2010)
Python Web Developer (April 2010)
Crawl Engineer (April 2010)
Distributed Architecture Developer (April 2010)

Development engineer internship, boilerplate detection

The Internet Memory Foundation (formerly European Archive) offers a position in an innovative and dynamic workplace, within a small and growing team dedicated to culture.

Internet Memory is involved in European research projects such as LiWA (http://liwa-project.eu/) whose purpose is to improve web archives quality and completeness.

Mission

To deal with full text search and mining of our archive, a whole natural language processing stack is used. One fundamental step of web pages analysis is the boilerplate elimination: getting rid of advertisement, navigation bars, footers and the like.

The goal of this internship is to compare existing tools and improve the quality of the boilerplate removal. The tasks, under the supervision of an engineer, will include the design, technical specifications, implementation (code and automatic tests) and documentation.

Profile

  • Completing the last year of a master degree in computer science
  • autonomous, team player
  • HTML, javascript, DOM
  • web protocols knowledge
  • a python development experience a plus
  • knowledge of Linux a plus
  • Erlang or other functional language experience a plus
  • Good command of English
  • French a plus

Details

  • Contract type: internship, full time, 5 months
  • Location: Montreuil (métro ligne 9 Robespierre or RER A Vincennes)
  • No telecommuting
  • 1500€ per month

Please mention "boilerplate detection internship" in the subject of your application.

Development engineer internship, datasets

The Internet Memory Foundation (formerly European Archive) offers a position in an innovative and dynamic workplace, within a small and growing team dedicated to culture.

Internet Memory is involved in European research projects such as LiWA (http://liwa-project.eu/) whose purpose is to improve web archives quality and completeness.

Mission

The web hosts vast amounts of tabular data. Compiling all this information can yield interesting results. However, detecting and processing this data is a challenge.

The main goal of this internship is to study and implement methods to detect tabular data in our archive, and classify it. The tasks, under the supervision of an engineer, will include design, technical specifications, implementation (code and automatic tests) and documentation.

Profile

  • Completing the last year of a master degree in computer science
  • autonomous, team player
  • a python development experience a plus
  • web protocols knowledge a plus
  • knowledge of Linux a plus
  • Erlang or other functional language experience a plus
  • Good command of English
  • French a plus

Details

  • Contract type: internship, full time, 5 months
  • Location: Montreuil (métro ligne 9 Robespierre or RER A Vincennes)
  • No telecommuting
  • 1500€ per month

Please mention "datasets internship" in the subject of your application e-mail.

Development engineer internship, execution-based crawler

The Internet Memory Foundation (formerly European Archive) offers a position in an innovative and dynamic workplace, within a small and growing team dedicated to culture.

Internet Memory is involved in European research projects such as LiWA (http://liwa-project.eu/) whose purpose is to improve web archives quality and completeness.

Mission

The web is getting more and more dynamic: javascript-generated content, including AJAX, and flash applications are pervasive. This hinders traditional crawlers that rely on simple regular expression search to find links on web pages. New approaches involving execution of the web pages have emerged. However, they are usually very resource-intensive, preventing large scale use.

The goal of this internship is to study execution-based techniques, and try to eliminate the graphical rendering to save on resources. The quality and performance of the different methods will have to be contrasted. The tasks, under the supervision of an engineer, will include the assessment of different methods, and the design, technical specifications, implementation (code and automatic tests) and documentation of the necessary software to experiment with headless crawlers and integrate it into our crawl infrastructure.

Profile

  • Completing the last year of a master degree in computer science
  • autonomous, team player
  • web protocols knowledge
  • HTML, javascript, DOM
  • a python development experience a plus
  • knowledge of Linux a plus
  • Erlang or other functional language experience a plus
  • Good command of English
  • French a plus

Details

  • Contract type: internship, full time
  • Location: Montreuil (métro ligne 9 Robespierre or RER A Vincennes)
  • No telecommuting
  • 1500€ per month

Please mention "execution-based crawler internship" in the subject of your application e-mail.

Development engineer internship, crawl patterns detection

The Internet Memory Foundation (formerly European Archive) offers a position in an innovative and dynamic workplace, within a small and growing team dedicated to culture.

Internet Memory is involved in European research projects such as LiWA (http://liwa-project.eu/) whose purpose is to improve web archives quality and completeness.

Mission

Web crawlers fetch resources from the web, scanning each one for new links. This enables the discovery of many parts of web sites starting from a few entry points. However, legitimate dynamic content or specifically crafted crawler traps can get a crawler to fetch an endless stream of useless resources.

The goal of this internship is to determine which crawl patterns indicate a trap, implement a detection module and integrate it into our crawl infrastructure. The tasks, under the supervision of an engineer, will include the design, technical specifications, implementation (code and automatic tests) and documentation of the necessary software.

Profile

  • Completing the last year of a master degree in computer science
  • autonomous, team player
  • web protocols knowledge
  • HTML, javascript, DOM
  • a python development experience a plus
  • knowledge of Linux a plus
  • Erlang or other functional language experience a plus
  • Good command of English
  • French a plus

Details

    Contract type: internship, full time
    Location: Montreuil (métro ligne 9 Robespierre or RER A Vincennes)
    No telecommuting
    1500€ per month

Please mention "crawl patterns detection internship" in the subject of your application e-mail.

Development engineer (May 2010)

The European Archive foundation offers a position in an innovative and dynamic workplace, within a small and growing team dedicated to culture.

The European Archive is involved in European research projects such as LiWA (http://liwa-project.eu/) whose purpose is to improve web archives quality and completeness.

Mission

Design, technical specifications, implementation (code and automatic tests), documentation and maintenance of the archival platform.

Profile

  • Master degree in computer science
  • 0 to 3 years of experience
  • autonomous, team player
  • web protocols knowledge required
  • knowledge of Linux
  • a python development experience a plus
  • Erlang or other functional language experience a plus
  • Pylons, Django experience a plus
  • Good command of English
  • French a plus

Details

  • Contract type: CDI (full time)
  • Location: Montreuil (métro ligne 9 Robespierre ou RER A Vincennes)
  • No telecommuting

Please send your resume and cover letter to jobs at europarchive dot org with the subject line "development engineer". The European Archive thanks all applicants for their interest, but advises that only those selected for an interview will be contacted.

Python Web Developer (April 2010)

The European Archive foundation offers a position in an innovative and dynamic workplace, within a small and growing team dedicated to culture.

Mission

Design, technical specifications, implementation (code and automatic tests), documentation and maintenance of the user interface to the on-line archival platform (used to launch and monitor crawls, for quality assurance...).

Profile

  • Master degree in computer science
  • Autonomous, team player
  • A python development experience is required
  • A web interface development experience is required (HTTP, HTML, CSS, MVC design, Ajax, SQL)
  • Pylons, Django experience is a plus
  • Knowledge of Linux
  • Good command of English
  • French is a plus

Details

  • Contract type: Permanent (full time)
  • Location: Montreuil (métro ligne 9 Robespierre or RER A Vincennes)
  • No telecommuting

Please send your resume and cover letter to jobs at europarchive dot org with the subject line "Web Developer". The European Archive thanks all applicants for their interest, but advises that only those selected for an interview will be contacted.

Web Crawl Engineer (April 2010)

The European Web Archive is a new Web Crawler Engineer to join our Paris-based team and help us archive the Internet and preserve this information for future generations. Find out more about our organization and web archive at www.europarchive.org

Your responsibilities include:

  • Running a set of tools including several web crawlers to collect content from the Internet.
  • Work with the QA team to ensure it is complete and of highest quality
  • Monitoring all production systems using automated tools
  • Working directly with our partner National Libraries, Archives and Universities to collect specific content on the Internet for preservation

Experience Needed:

  • Excellent knowledge of HTML, Javascript and Web technologies in general
  • Extensive use of Linux shell scripting
  • Experience in Internet protocols (HTTP is a must have)
  • Able to work in loosely structured start up work environment

Education:

  • Computer Science Bachelor, Master or equivalent work experience

Details

  • Contract type: Permanent (full time)
  • Location: Montreuil (métro ligne 9 Robespierre or RER A Vincennes)
  • No telecommuting

Please send your resume and cover letter to jobs at europarchive dot org with the subject line "Web Crawl Engineer". The European Archive thanks all applicants for their interest, but advises that only those selected for an interview will be contacted.

Distributed Architecture Developer (April 2010)

The European Archive is looking for an experienced developer to join our Paris-based engineering team to participate in the development of our distributed web archiving infrastructure. Find out more about our organization and web archive at www.europarchive.org

Your responsibilities include:

  • Participate in the specification of our evolving distributed web archiving platform.
  • Develop and integrate modules of the platform

Experience Needed:

  • Excellent knowledge in distributed platform development
  • Fluent in Python and Erlang
  • Experience in Internet protocols (HTTP is a must have)
  • Able to work in loosely structured start up work environment

Education:

  • Computer Science PhD, Master or equivalent work experience

Details

  • Contract type: Permanent (full time)
  • Location: Montreuil (métro ligne 9 Robespierre or RER A Vincennes)
  • No telecommuting

Please send your resume and cover letter to jobs at europarchive dot org with the subject line "Platform Developer". The European Archive thanks all applicants for their interest, but advises that only those selected for an interview will be contacted.

spacer spacer
spacer
spacer