Posted over 1 year ago. Visible to the public. Repeats. Linked content.

What's so hard about PDF text extraction? ​

There is a common view that extracting text from a PDF document should not be too difficult. After all, the text is right there in front of our eyes and humans consume PDF content all the time with great success. Why would it be difficult to automatically extract the text data?

Turns out, much how working with human names is difficult due to numerous edge cases and incorrect assumptions, working with PDFs is difficult due to the extreme flexibility given by the PDF format.

Flaky tests are tests that sometimes fail for no obvious reason. They are the plague of many end-to-end (E2E) test suites that automate the browser through tools like Capybara and Selenium.

Join our free training event and learn to fix any flaky test suite, even in large legacy applications.

Owner of this card:

Avatar
Henning Koch
Last edit:
over 1 year ago
by Tobias Kraze
About this deck:
We are makandra and do test-driven, agile Ruby on Rails software development.
License for source code
Posted by Henning Koch to makandra dev
This website uses short-lived cookies to improve usability.
Accept or learn more