I'm very pleased to finally have reached the first alpha release of PdfPig (NuGet).
PdfPig (GitHub) is a library that reads text content from PDFs in C#. This will help users extract and index text from a PDF file using C#.
The current version of the library provides access to the text and text positions in PDF documents.
The library began as an effort to port PDFBox from Java to C# in order to provide a native open-source solution for reading PDFs with C#. PdfPig is Apache 2.0 licensed and therefore avoids questionably (i.e. not at all) 'open-source' copyleft viral licenses.
I had been using the PDFBox library through IKVM and started the project to investigate the effort required to make the PDFBox work natively with C#.
In order to understand the specification better I rewrote quite a few parts of the code resulting in many more bugs and fewer features than the original code.
As the alpha is (hopefully) used and issues are reported I will refine the initial public API. I can't forsee the API expanding much beyond its current surface area for the first proper release.