Thursday, April 25, 2019

Bytescout PDFExtractor CSV, XML, XLS, XLSX

Bytescout.PDFExtractor 6.12.0.2239

Bytescout PDF Extractor SDK for .NET 2.00-4.50, ASP.NET, ActiveX

(c) ByteScout 2008-2015

System Requirements: .NET framework installed
Works with: ASP.NET, .NET (Server side and client side), ActiveX, Visual Basic 6, Delphi, Classic ASP, Delphi and others.

Benefits:

- Extracts data from tables in PDF files as CSV, XML, XLS, XLSX;
- Extracts embedded files and attachmentes from PDF;
- Splits, merges PDF documents, extracts single pages;
- Extracts text from PDF (from whole page or given rectangle);
- Extracts embedded images from PDF documents;
- Extracts document information from PDF (author, subject, producer etc)
- Detects tables in PDF file 
- Searches text inside PDF with support for regular expressions;
- Extracts data from FDF, XFA forms;
- Reads text from images using OCR with multiple Western and Asian languages supported;
- and much more!

Web-site:
http://bytescout.com/
 There is a newer version of this package available. 
See the version list below for details.

Release Notes

Bytescout PDF Extractor SDK for .NET, ASP.NET and ActiveX
------------------
6.11.2193 (August 3, 2015)
Batch Processing samples updated to show the use of Reset() method
C++ source code sample added for Pages Extraction
DocumentMerger adds Merge2(inputfile1, inputfile2, outputfile) method to merge 2 files
XLS Extractor minor bug-fixes
PDF Multitool now allows to enable/disable text, image, vector layers, adds advanced settings for text extraction
XML, CSV, Table extraction improves support for tables with emtpry cells inside columns

6.10.2136 (June 16, 2015)
improved PDF to Text extraction
.ExtractShadowLikeText property improved: better filtering for shadow-like text
improved stability and PDF text support

6.00.2071 (May 14, 2015)
PDF to XML, PDF To CSV, PDF To Text functionality improved
PDF To XLS command line sample added (based on vbscript)
PDF To HTML SDK adds new .DetectHyperLinks property (TRUE by default) to enable/disable automated links detection in the text
New SearchablePDFMaker (available for PRO licenses) to convert PDF into searchable PDF files
new properties in extractor: ConsiderFontNames, ConsiderFontSizes, ConsiderFontColors, ConsiderVerticalBorders in CFG files
header columns detection (when AutoAlighHeaderToColumns = true) improved
.DetectLinesInsteadOfParagraphs replaced with new .LineGroupingMode to control how lines are merged into paragraphs
IMPORTANT PDF To XML fixes long time issue with incorrect Y coordinate for text objects (was point to the bottom left instead of top left)
.TableXMinIntersectionRequiredInPercents and .TableYMinIntersectionRequiredInPercents properties added
C++ source code sample added
XML Extractor fixes missing empty columns in PreserveFormatting=true mode
Minor fixes in colors in some PDF files
support for for multiple OCR languages added
PDF Multitool GUI: adds Copy to Clipboard button to TXT, CSV, XML and raster renderer dialogs
XLSExtractor: adds PageToWorksheet property to enable/disable generation of separate worksheets per page.
new .TextEncodingCodePage property
PDFViewerControl: adds ValidateContextMenu allowing user to add custom items to context menu
PDF Viewer control: adds properties ShowTextObjects, ShowImageObjects, ShowVectorObjects.
XMLExtractor now adds "OCRConfidence" attribute for recognized text 
PDF/A checking functionality (in beta)
improving controls and text checking and alignment according to the original layout. The issue was caused by the shift of Y coordinates in controls while parsing: that was incorrect. The correct way is to shif...
XML Extractor updated: now produces <CONTROL> tag for checkboxes and text fields
changed using of current directory to temp directory.
checkboxes,radioboxes, editboxes, comboboxes are better supported
now allows partial trust callers.


5.20.1781 (January 27, 2015)
PDF to XML, PDF to CSV, PDF to Text functionality improved
OCRMode now provides 9 modes
.DetectLineInsteadOfParagraph now works much better. Set it to False to capture multiline text in table cells!
PDF controls support improved
FDF and XFDF data extraction added
Table detection improved to support multline text in cells and tables with absent rows
beta version of PDF/A validator added
minor fixes and improvements


5.10.1747 (November 25, 2014)
PDF to XML, PDF to CSV, PDF to Text functions improved
now supports text extraction from text controls
XML extractor now adds font style, size, name, text coordinates into <text> tags
ASP.NET sample for OCR usage added
new property OCRLanguageDataFolder to specify the location of "tessdata" folder
improved support of PDF files
improves support for rotated text
updated source code samples
updated documentation
minor improvements and fixes

5.00.1626 (August 14, 2014)
OCR (text from images) functionality added: now you may extract text from embedded images and repair damaged text
issue fixed with CSV and XML extractor missing last columns with some settings
improved support for damaged PDF files
multiline search text search with word matching modes is now supported
now may search text with hyphens and on different lines: see new source code sample Find Text With Hyphens
new property .RTLTextAutoDetectionEnabled (false by default) to auto detect RTL languages
PDF Viewer GUI demo improved
minor improvements and fixes

4.00.1487 (May 30, 2014)
improved pdf to text, pdf to csv, pdf to xml
issue with extraction area fixed
Improved Unicode handling
new .ContentType to check if PDF is PDF, Portfolio or XFAForm 
new properties: Unwrap, ExtractionAreaUsageMode 
new AttachmentInfo class to obtain details about attachment
new XFA Form XML extraction support (see XFAFormExtractor and XFAFormToXML samples)
new ZuGFeRD PDF support added
Multhithreading performance improved
Licensing updated: Now Licensing is per developer
new "match whole word" parameter to TextExtractor.Find()
improved XLS and XLSX output

3.40.1349 (March 10, 2014)
improved stability of the text extraction
issue with the very last text line missing in some PDF files fixed
tables with empty cells are handled better now
issue with incorrect extraction of overlapped text objects fixed
issue with missing spaces between words in some files fixed
issue with incorrect X coordinate returned while searching with extraction area defined
minor bug-fixes and improvements

3.30.1240 (November 27, 2013)
improved support for old formats PDF files
image flipping issue in some PDF files fixed 
improved text rendering in PDF files
minor bug-fixes

3.20.1209 (October 31, 2013)
table detection was not returning proper coordinates for 2nd and further tables, fixed
minor source code samples updates
DocumentSplitter now works with multipage TIF files 
minor bug-fixes

3.20.1200 (October 28, 2013)
minor rotated text issues fixed
table detection was not returning proper coordinates, fixed
minor bug-fixes

3.20.1179 (October 22, 2013)
pdf to text and pdf data extraction improved
new .AutoAlignColumnsToHeader (true by default) property to automatically align cells to the header column or not (switching this setting will help if you are getting some shifted cells)
new DocumentRotator class to rotate pages in PDF documents
new ExtractRawImages property in Images Extractor to define if we are extracting raw images or images with rotation and transformation applied 
improved support of PDF files with rotated objects and pages
new source code sample showing how to extract page found by a keyword "Find Keyword And Extract Page"
Images Extractor: SetExtractionArea() method added to define a rectangle area to extract images from 
improved Splitting Pages example
improved pages extraction from PDF
new RemoveUnusedResources method to remove unused resources from PDF to reduce file size
minor bug-fixes and improvements

3.20.1100 (August 22, 2013)
new method: DocumentSplitter.Split(sourcefile, splitPages) to extract mulitple ranges of pages from the same PDF file
minor bug-fixes in pdf to text engine

3.20.1093 (August 5, 2013)
pdf to text minor functionality fixes
x64 installer improvements
minor fixes for error messages
PDFDocument.Dispose() now not disposing the source stream with PDF if this stream was supplied by the user (so user should dispose it)
improved PDF format support
minor bug-fixes

3.20.1075 (July 11, 2013)
improved PDF To CSV, PDF To XLS, PDF To XML extraction
improved PDF reading speed and stability
minor bug-fixes


3.10.1051 (June 29, 2013)
improved table extraction support
improved pdf files support

3.10.1038 (June 26, 2013)
improved text extraction support
issues fixed related to incorrect extraction area coordinates for some PDF files with scanned images 
speed improvements
improved support for various PDF files

3.10.942 (May 30, 2013)
improved pdf text extraction support
minor bug-fixes and improvements

3.10.899 (May 14, 2013)
improved pdf to text conversion
improved PDF reading support
more source Visual Basic .NET, C# and VBScript code samples added 
documentation updated

3.00.864 (April 11, 2013)
improved PDF extraction support
improved PDF handling
pdf splitting and merging: new property to optimize PDF files after splitting DocumentSplitter.OptimizeSplittedDocuments may decrease file size when needed
improved PDF fonts handling
demo utility updated
source code samples updated to run on any .NET framework by default
minor bug-fixes


3.00.825 (March 12, 2013)
improved pdf to text, pdf to csv
demo utility PDF Viewer reworked and updated for better UI experience
minor improvements and fixes in PDF support
improved PDF stability while working with PDF files with high density vector graphics inside
improved support for indexed color pallettes 
improved embedded fonts rendering
better support for Unicode fonts
new .Version property to read exact version of the dll
minor updates and improvements


2.50.708 (November 11, 2012)
PDF data extraction speed improved
Windows 8 support improved
PDF images and colors support improved
PDF to csv, PDF xml, PDF to xls/xslx now skips first leading rows if they are empty
pdf text search now works better and provides more intelligent support for regular expressions
ActiveX support and installation improved and now provides single batches to run on Windows x86/x64 for Windows XP to 8 Pro 
new property: .ExtractShadowLikeText to enable/disable extraction of shadowed text (where it is used as effect to create visual shadows)
minor bug-fixes and improvements


2.40.650 (November 1, 2012)
improved support for Unicode text extraction 
improved support for PDF/A pdf files 
issues with white stripes appearing on multiple images combined fixed
data extraction internal optimizations
improved support for 8 bit images inside PDF
vector drawings improved to provide better support for multiple small objects 
Color representation in images with indexed colors fixed
Type2 fonts support improved
Improved support for embedded fonts in PDF produced by Ghostscript engine
CCIT images compression compression related issues fixed
LZW compressed PDF support improved
improved support for shading objects
improved PDF fonts support 
improved support for PDF with 4 bit images


2.30.594 (September 18, 2012)
PDF data extraction improved
memory and speed optimizations
fixing issue with empty data while extracting data from some PDF files
improved images extraction support (more image encoding variations are supported)
minor updates in examples
minor bug-fixes

2.30.568 (June 21, 2012)
pdf to text conversion quality improved
multithreading usage stability has been improved
hanging issue on some PDF fixed
PDF Extractor SDK: updated sample for StructuredExtractor (previously known as TableExtractor interface)
minor fixes and improvements


2.20.0.539 (May 4, 2012)
improved stability
demo utility improved
important security fixes


2.20.525 (April 14, 2012)
improved speed (up to x2 faster on some documents)
Tables detection improved
updated PDF Viewer utility
improved support for structured text extraction (CSV and XML data extraction)
minor bug-fixes

2.20.458 (February 2, 2012)
minor fixes in TableDetector class (.TableDetectionMinNumberOfColumns and .TableDetectionMinNumberOfRows were working incorrectly)
improved text extraction for PDF files generated from text files
improved support for PDF files produced by Adobe Acrobat
PDF Viewer: CSV, XML and Text extractor forms updated to show .PreserveFormattingOnTextExtraction option
minor fixes in .NET 4.0 assemblies
Renderer SDK adds /Visual Basic/PDF To BMP using streams/ sample
improved support for PDF with forms objects
improved leading spaces format detection in text extraction
.SetExtractionArea() added to define area on a page to work with in PDF Renderer SKD
improved fonts information reading support in PDF files
new .PageSeparator property in TextExtractor allowing to define a separator string for pages if you need one
fixing issue with indexed colorspaces in PDF
improved PDF format support

2.20.415 (December 21, 2011)
PDF Extractor SDK: minor update for PDF to XLS sample
rendering: improved fonts support
text extraction with formatting improved
new source code sample to show how to save extracted text to a stream
performance optimized and pdf processing speed improved
improved support for PDF format

2.20.396 (November 30, 2011)
fixing issues with CSV, XML and XLS extraction on long tables
PDF Viewer now provides ability to turn on/off text formatting support on extraction
PDF support improved
minor bug-fixes

2.20.392 (November 25, 2011)
NEW table detection implemented, see new Bytescout.PDFExtractor.TableDetector interface and source code samples in /Find Table And Extract As CSV/ sub-folder in examples
NEW regular expressions support for text search in TextExtractor (see .RegexSearch property)
Text search functionality improved
minor bug-fixes

2.10.303 (October 4, 2011)
NEW: DocumentMerger and DocumentSplitter interfaces and classes to merge and split PDF documents
improved support for PDF documents
PDF processing speed increased
minor bug-fixes

2.10.276 (August 26, 2011)
NEW: AttachmentExtractor interface to extract file attachments and embedded files from PDF (see /Examples/Extract Attachments/ for sample source code)
NEW: XLSExtractor interface to extract tables from PDF as XLS and XLSX Excel files (including font formatting)
improved text extraction functionality
improved output image quality
improved support of Unicode text
improved support of damaged PDF files (not hanging on damaged files anymore)

2.00.228 (12 July 2011)
CSVExtractor: SeparationSymbol and QuotationSymbol properties were added
TrimValues property for CSVExtractor and XMLExtractor: turned on by default to trim detected cell values automatically
Default properties for CSV extraction improved
fixed incorrect default space ratio in text extractor to 0.4, previous value 1.2 was causing to join some words into a single one
TextExtractor.detectNewColumnBySpacesRatio renamed into .SpaceRatioBetweenWords property
PDFViewer now shows options dialog to adjust SpaceRatioBetweenWords if needed
minor bug-fixes

2.00.217 (21 June 2011)
CSV and XML extraction speed greatly improved
CSVExtractor and XMLExtractor classes add new .DetectNewColumnBySpacesRatio property: use this property to control space between detected columns of text
XML and CSV Extractor adds .SkipCellsWithEmptyValues property (true by default to skip cells with empty values)
PDF Viewer now shows extraction options dialog for XML and CSV export functions
PDF To CSV to XLS source code sample added
PDF To CSV\Delphi\ source code sample added
minor bug-fixes and improvements

2.00.206 (6 June 2011)
support for .NET 3.5, .NET 4.00 added
Delphi source code sample has been added
minor bug-fixes and improvements

2.00.186 (May 16, 2011)
pdf processing speed increased up to x10 times
minor bug-fixes and improvements

1.10.168 (May 6 2011)
support for password protected PDF documents improved (was not working properly in previous release)
minor bug-fixes and improvements

1.10.160 (12 April 2011)

XML comments are available now to show hints for methods, classes and properties in Visual Studio
New property: .ExtractColumnByColumn (false default), set to True to extract text column by column instead of line by line
PDF Viewer freeware utility updated to feature "Extract Text (line by line)" and "Extract Text (column by column)" buttons
improved support for single paged PDF documents produced by Acrobat Distiller software
clipping issues were fixed 
fixed hanging on some broken PDF documents 
improved text decoding support
minor bug-fixes


1.10.150 (10 March 2011)
* PDF files support improved
+ now handles PDF files from Google Doc without errors
* minor bug-fixes

1.10.144 (26 February 2011)
+ now works with secured documents (provide passsword if needed in .Password property)
+ minor bug-fixes and improvements
+ updated GUI demo application

1.10.121 (11 February 2011)
+ PDF to CSV extractor added
+ PDF to XML extractor added
+ support for invisible text extraction added
+ minor bug-fixes and improvements


1.00.30 (9 November 2010)
+ new version

 Dependencies

This package has no dependencies.

 Version History


VersionDownloadsLast updated
10.1.0.344420220 days ago
10.1.0.34395021 days ago
10.0.0.3429124a month ago
10.0.0.342743a month ago
10.0.0.342452a month ago
10.0.0.342339a month ago
10.0.0.342239a month ago
10.0.0.342159a month ago
9.4.0.3398139a month ago
9.3.0.33663172 months ago
9.3.0.33571423 months ago
9.3.0.3354733 months ago
9.2.0.32937005 months ago
9.2.0.32623646 months ago
9.2.0.32591416 months ago
9.1.0.31706879 months ago
9.1.0.31672379 months ago
9.1.0.31651659 months ago
9.1.0.31631899 months ago
9.0.0.30951,2534/23/2018
9.0.0.30874024/13/2018
9.0.0.30802234/11/2018
8.8.1.30466162/20/2018
8.8.1.30256071/29/2018
8.8.0.30212651/23/2018
8.7.0.298181311/8/2017
8.6.0.29171,2148/2/2017
8.6.0.29122428/1/2017
8.5.0.28634796/9/2017
8.5.0.28612936/8/2017
8.5.0.28562856/1/2017
8.4.1.28294,5464/12/2017
8.4.0.28213193/29/2017
8.3.0.28095793/13/2017
8.3.0.28062453/12/2017
8.3.0.28032593/6/2017
8.3.0.28012383/6/2017
8.3.0.28002363/6/2017
8.3.0.27982303/6/2017
8.3.0.27962333/6/2017
8.3.0.27942343/6/2017
8.2.0.26995891/11/2017
8.1.1.260692910/25/2016
8.1.0.260030310/21/2016
8.0.0.25424759/1/2016
8.0.0.25412849/1/2016
8.0.0.25283308/23/2016
8.0.0.25232908/19/2016
7.0.0.249310,7276/27/2016
7.0.0.24892576/27/2016
7.0.0.24806326/10/2016
7.0.0.24745165/26/2016
6.30.0.24214793/24/2016
6.20.0.23544871/20/2016
6.12.0.22393,2329/22/2015
5.20.0.18719532/5/2015
5.0.0.16269928/14/2014
4.0.0.14875445/31/2014
3.40.0.13496493/11/2014
3.20.0.10925608/5/2013
3.20.0.10751,0607/12/2013
3.10.0.10514916/29/2013
3.0.0.8395783/26/2013
2.50.0.7695102/25/2013

No comments:

Post a Comment