How do you extract text from a PDF file in Delphi ?

HI,

I've had a look round the web and drew a blank. Can anyone help?

TIA

Mark Patterson
-1
Mark
11/9/2010 11:45:57 AM
embarcadero.delphi.non-tech 5933 articles. 1 followers. Follow

13 Replies
4160 Views

Similar Articles

[PageSpeed] 29

Hi,

Yes this is quite difficult. The most tools are expensive and there is no cheap 
way I have found yet and also the free OCX of adobe is very limited.

First of all you need to understand that PDF is a postscript language and
not a textfile. Some PDF have no text at all but only contain pictures and that depends 
on the scanner or the PDF writer that is used. 
Normally you see this by opening the pdf in Adobe reader and see if there is a way to 
select the text. Is there a textcursor select (text) or rectangle select (picture)

It is possible that you can select a text but the application is doing an OCR on the fly
to extract the text. Then also you have PDF's that are both, text and pictures mixed.
At last you have compressed PDF's and non-compressed PDF's. (also depends on the 
scanner / PDF writer)

There is a good PDF tool that allows you to save the PDF as
readable text. (If possible, see above..)

Download the PDFTK
http://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/
and read the helpfile.

If you want to search in the PDF then you can do it in a dirty way.

Open your PDF in Notepad. Do you see any readable characters?
If not then you can try to do UnCompress with pdftk.
If you still can't read the text then the pdf is maybe a picture.
If you can read text (with extra code characters between) then you 
can try to write your own parser in Delphi to search for text.

{code}
var
ReadFile : TextFile;
TextLine : string;

        AssignFile(ReadFile, FFileName);
        SetLineBreakStyle(ReadFile, tlbsLF);
        Reset(ReadFile);
        While not Eof(ReadFile) do
        begin
          ReadLn(ReadFile, TextLine);

          ..Pos
          ..Copy
          ..Delete
          ..
        end;
        CloseFile(ReadFile);

{code}
0
Robert
11/9/2010 2:11:31 PM
Hi

i use this:

http://www.verypdf.com/
http://www.verypdf.com/pdf2txt/pdf2txt.htm

ttousends of PDFs... with out problems...

Nils


> HI,
>
> I've had a look round the web and drew a blank. Can anyone help?
>
> TIA
>
> Mark Patterson
0
Utf
11/9/2010 2:34:18 PM
> 
> I've had a look round the web and drew a blank. Can anyone help?
> 


Try searching on "Delphi ifilter"

David
0
David
11/9/2010 2:57:21 PM
Am 09.11.2010 12:45, Mark Patterson wrote:

> I've had a look round the web and drew a blank. Can anyone help?

In addition to the (mostly commercial) libraries for Delphi and 
depending on target system configuration and programming language 
knowledge, another option would be to simply call a non-Delphi application.
There are many high quality open source PDF libraries for the Java 
platform, like JPedal with text extraction functions documented in 
http://www.jpedal.org/support_Extraction.php

Hope this helps
-- 
Michael Justin
habarisoft - Enterprise Messaging Software for Delphi®
http://www.habarisoft.com/
0
Michael
11/9/2010 5:17:31 PM
> Yes this is quite difficult. The most tools are expensive and there is no cheap 
> way I have found yet and also the free OCX of adobe is very limited.
> 
> First of all you need to understand that PDF is a postscript language and
> not a textfile. Some PDF have no text at all but only contain pictures and that depends 
> on the scanner or the PDF writer that is used. 
> Normally you see this by opening the pdf in Adobe reader and see if there is a way to 
> select the text. Is there a textcursor select (text) or rectangle select (picture)

The one I was testing on definitely allows select and copy.
 
> It is possible that you can select a text but the application is doing an OCR on the fly
> to extract the text. Then also you have PDF's that are both, text and pictures mixed.
> At last you have compressed PDF's and non-compressed PDF's. (also depends on the 
> scanner / PDF writer)
> 
> There is a good PDF tool that allows you to save the PDF as
> readable text. (If possible, see above..)
> 
> Download the PDFTK
> http://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/
> and read the helpfile.

I tried that but it didn't seem to do anything.
0
Mark
11/10/2010 4:44:52 AM
> {quote:title=Nils Bödeker wrote:}{quote}
> Hi
> 
> i use this:
> 
> http://www.verypdf.com/
> http://www.verypdf.com/pdf2txt/pdf2txt.htm
> 
> ttousends of PDFs... with out problems...

Ta, I tried it and found that it is limited in the length it will process on a given file unless you pay.
0
Mark
11/10/2010 4:47:17 AM
> {quote:title=David Wilcockson wrote:}{quote}
> 
> Try searching on "Delphi ifilter"

THanks, I tried that, especially the code here:http://www.experts-exchange.com/Programming/Languages/Pascal/Delphi/Q_20293579.html

But for the pdf I wanted it only output what looks like a header. It even finishes with the words
"Regards Carmel Mulhern Company Secretary"
as if this is all that we are supposed to get. It's a company's Financial Results, and I am trying to get the data from it.

Regards
Mark
0
Mark
11/10/2010 4:52:21 AM
Mark Patterson wrote:
> HI,
> 
> I've had a look round the web and drew a blank. Can anyone help?
> 

I've spent a lot of time on this and have found two commercial libraries
that I use.

You might want to take a look at these: QuickPDF, Gnostice PDF Toolkit
0
Thomas
11/10/2010 5:29:26 AM
Hi,

Did you try to Uncompress the PDF first?

You can try to convert the PDF in another (lower) version..
If you open the PDF in a Text Editor (notepad) the first
characters in the file is the version number.

1) Use the tool to Uncompress (to be sure..)
2) Convert to version 1.3
3) Use the tool to extract the text.

Maybe your PDF is just a Picture and your select
text tool in your PDF editor is doing an OCR after
the selection.

It is difficult to say without an example..
0
Robert
11/10/2010 8:39:16 AM
I am using http://www.foolabs.com/xpdf/, convert PDF to text and then parse it...
0
Utf
11/10/2010 9:08:15 AM
Mark Patterson <> wrote in news:304270@forums.embarcadero.com:

>> {quote:title=David Wilcockson wrote:}{quote}
>> 
>> Try searching on "Delphi ifilter"
> 
> THanks, I tried that, especially the code
> here:http://www.experts-exchange.com/Programming/Languages/Pascal/Delph
> i/Q_20293579.html 
> 
> But for the pdf I wanted it only output what looks like a header. It
> even finishes with the words "Regards Carmel Mulhern Company
> Secretary" as if this is all that we are supposed to get. It's a
> company's Financial Results, and I am trying to get the data from it. 
> 
> Regards
> Mark

Could it be that the text you want is included as an image?
0
Christopher
11/10/2010 9:24:20 AM
Nils Bödeker wrote:

> i use this:
> 
> http://www.verypdf.com/
> http://www.verypdf.com/pdf2txt/pdf2txt.htm

Developer licence $2000.  OUCH!!!

-- 
Andy Syms
Technosoft Systems Ltd
0
Andy
11/10/2010 10:15:34 AM
> {quote:title=Stanko Milošev wrote:}{quote}
> I am using http://www.foolabs.com/xpdf/, convert PDF to text and then parse it...

+1 

It's free, you just have to call one exe, and you'll get your text in a file. Very easy to use with Delphi.
0
Arnaud
11/10/2010 4:19:12 PM
Reply:

Similar Artilces:

Delphi and Delphi for .Net
It seems that Delphi for .Net is slower than Delphi Win32 native applicaiton. I would like to know is it true all .Net application is slower than Win32 native applicaiton or it is Delphi for .Net only. Your information is great appreciated, Inung On 2011-06-21 18:20:17 +0100, Inung Huang said: > It seems that Delphi for .Net is slower than Delphi Win32 native applicaiton. > I would like to know is it true all .Net application is slower than > Win32 native applicaiton or it is Delphi for .Net only. If you are only running the code in the application once then, yes, yo...

How to use a delphi XE3 obj file in delphi 2007?
Hi, I'm migrating an application from delphi 2007 to delphi XE3 and I would like to do it in several steps. To do this I need to use a part of my new code (XE3) in the current delphi 2007 application. So I want to use the .obj file from my XE3 file. I know I have to use the {$L 'Filename'} and I know that I have to declare the functions that I want to use in external in my D2007 project. I wanted to make an easy and simple example to see how it works with only 1 function 'hello world' compiled in obj with XE3 but I can't find the way to use it in D2007. In D200...

create PDF file from delphi
Hello, Is it posible to export data from delphi as a pdf file from qreport for example or from any other object such as string grid. is there any commponents that do so ? thank you yuda yuda, On Thu, 06 May 2010 09:47:02 +0100, yuda levi wrote: > Is it posible to export data from delphi as a pdf file from qreport for > example or from any other object such as string grid. > is there any commponents that do so ? pdfFactory (www.fineprint.com) together with the info in the "Dev Kit" works for us. Since it is implemented as a printer (rather than a...

Text.Write speed
Hi ! I have speed troubles for using Text.Write method in Delphi 2006, particularly with distant files (local network). For example, with the following code ("F" is a text file ; "Line" is an about 200-character string): ----- AssignFile(F, FileName); try Rewrite(F); for l := 1 to 100 do begin for c := 1 to Length(Line) do Write(F, Line[c]); Writeln(F); end; finally CloseFile(F); end; ----- On a distant file: Delphi 5 -> 0.08 seconds Delphi 2006 -> close to 8 seconds Is there a speci...

Reading a text file in Delphi 2009
Hi, I recently downloaded a free 30 day trial of delphi 2009 for my computing AS level. I am currently using the program to write simple coonsole applications. I am having trouble with a certain command Reset. This is the code that i have entered and the error occurs when it reaches Reset. We have Delphi 7 at my school and i have used it on 7 and itt works but when i try and run it on 2009 it comes up with an error message abnd a warning saying; [DCC Warning] txtfilerandom.dpr(20): W1019 For loop control variable must be simple local variable [DCC Error] txtfilerandom.dpr(28): E2070 Unknow...

I need help:File CodeGear.Delphi.Targets dammaged in Delphi X2
Hello My computer has a crash and the file CodeGear.Delphi.Targets was dammaged. Delphi open but I can not compile anything. other files recovered but Windows Backup Center does no save this filetype .Targets (pay attention!) Can someone send my privatelly this file or say me where I can download it. It's Delphi X2 update 4 I tried delphi X2 repair but it gives me error Thank you. Joan Galí GTV Barcelona-Europe Solved reinstalling Delphi. El 30/06/2014 13:51, Joan Gali ha escrit: > Hello > My computer has a crash and the file CodeGear.Delphi.Targe...

I need help:File CodeGear.Delphi.Targets dammaged in Delphi X2
Hello My computer has a crash and the file CodeGear.Delphi.Targets was dammaged. Delphi open but I can not compile anything. other files recovered but Windows Backup Center does no save this filetype .Targets (pay attention!) Can someone send my privatelly this file or say me where I can download it. It's Delphi X2 update 4 I tried delphi X2 repair but it gives me error Thank you. Joan Galí GTV Barcelona-Europe Solved reinstalling Delphi. El 30/06/2014 13:51, Joan Gali ha escrit: > Hello > My computer has a crash and the file CodeGear.Delphi.Targets...

Delphi 7, display a text file at project load
Hi, I got this working on another project, but I cannot get it working on a new project I'm working on. Basically, I have two text files, one is called "ReleaseNotes.txt" and the other is called "ToDo_KnownBugs.txt". I would like to get Delphi 7 to display both of these files (and nothing else) at project startup. I've tried editing the DSK file, but without success! I'm presuming there is an easy way to do this from the IDE, but I have not found out what that easy way is. Thanks, Alain On 7/12/2011 6:54 AM, Alain Dekker wrote: > I ...

SEPA components for Delphi with Source Code (Delphi 5
Hi all, in the european union change next year the Bankingformat to the SEPA Format. All peoples and companies must change the bankingssoftware and the costumer data form acountnummers in the new IBAN and BIC numbers. See: http://www.arma-it.de/shop/artikelueber.php?wgruppeid=211&wgruppe_offen=211 Functions: - generate SEPA XML'S - Calc IBAN - BIC Database (DE,AT and CH) Questions: vertrieb@arma-it.de PS: Bankinssoftware for Develpoers (Germany only) http://www.arma-it.de/shop/artikelueber.php?wgruppeid=212&wgruppe_offen=212 El 26/10/13 21:38, A...

Delphi 5 To Delphi 2009
I upgraded to Delphi 2009 from D5. The install says I can install Delphi and/or C++. Delphi installed OK but I see nothing of C++. What am I missing or does my upgrade not include C++? Thanks It depends on what you bought. If you bought Delphi 2009 only, that's what you get. If you bought Delphi 2009 and C++ Builder 2009 you get both. My guess is you got Delphi 2009 only. The simplest way to verify is look your invoice - it should say I would think. You could also go to members.embarcadero.com, login, then click on my registered products. There will be a textual description of...

Delphi 7 to Delphi XE
Have been using Delphi 7 for many moons ( have got later versions but never upgraded to ) My first problem is: Component Palette. in XE it is a small toolbar docked in top right in Delphi 7 it gives a large view of all the components. I am struggling to be able to cope/access my components.in Delphi XE. Can I make the component pallette tool bar the same size as Delphi 7, or is there a fast way to view/choose all available components in XE, that I have not spotted yet? Kind Regards, Robert. Hi, What I know is that in Delphi 2010 and XE you can choose between t...

Delphi 2010 adding spaces between each char when writing out text file
I have a project, upgraded from delphi 5 to delphi 2010. When in del 5 code worked flawlessly. After upgrading to del 2010 it creates file, but places a space between each character. The first 5 Characters should look like T2009 Looks Like T 2 0 0 9 Snippet of code is below Thank You type IRSRec = Array[1..750] of Char; var IRSTAX : IRSRec; FileVar : File of IRSRec; TempS : String; Procedure IndexIn(var S : IRSRec; T : string; Pos, Num : integer); var i : integer; begin If Num > 0 then f...

Delphi 4 to Delphi 2007
Hello, I will have to port a D4 application (with source) to D2007. what kind of problem could I face ? I will have to go to customer site tommorow to analyse its source code to quote the work, what should I care of to hestimate the porting time ? Thanks John Terry wrote: > Hello, > I will have to port a D4 application (with source) to D2007. > what kind of problem could I face ? > I will have to go to customer site tommorow to analyse its source code > to quote the work, what should I care of to hestimate the porting time ? You can probably do it by just changi...

Delphi 4 to Delphi 2009
Hello, Thanks to all who answered my previous question. That was a great help. And atlast our client agreed to upgrade our delphi version from 4 to Delphi 2009. *Sigh*. But before that, I need to give the estimation and cost regarding the migration to delphi 2009. Can anyone tell me is there any tool to migrate from delphi 4 to delphi 2009 or just I need to compile our Delphi 4 application in Delphi 2009. I have read from the delphi 2009 feature matrix that Delphi 1 through Delphi 2007 import is possible in delphi 2009. But i am not that sure considering the size of our application. ...

from delphi 6 to delphi 2010
Hi. It is possible, with component RX, dxforumlibrary, InfoPower3000Pro, StringAlignGrid. Accepts communication BDE. Thank by comments. excequiel arostica wrote: >Hi. > It is possible, with component RX, dxforumlibrary, >InfoPower3000Pro, StringAlignGrid. Accepts communication BDE. > >Thank by comments. Rx is dead and sources are taken over by jcl/jvcl. I dont know about the rest of the components and i have no experiences with bde over the last 9 years. excequiel arostica wrote: > Hi. > It is possible, with component RX, dxforumlibrary,...

Delphi 5 to Delphi XE4
Thinking about making the conversion. Of course we have numerous components such as: TurboPower AsyncPro, TurboPower Orpheus ICS2 Synactis All-In-The-Box. You guys have any advice as to the effort and time it may possibly take. It is a large application, several hundred thousand lines. And that's what happens when using third party components, a lot of extra work. I have been burned a few times. I now minimize the use to a few well known suppliers, like TMS. I have "banned" a lot of other components. Regards, Ole > > Thinking about making the conver...

Delphi for PHP or Delphi PRISM
Hi, I have the opportunity to develop a web-based library management system. Nothing fancy, just being able to do the usual CRUD stuff for books and provide a search facility. Borrowing is to be done via an email request to the library admin who then sends out the book(s). Since both Delphi for PHP and Delphi PRISM will enable me to develop the app, which one will allow me to deliver it in less time and also increase (even how small) my marketability as a web developer? Thanks. Phillip Flores Phillip Flores wrote: > Hi, > > I have the opportunity to develop a...

Delphi XE / Delphi 2010
Hello! I noticed that Embarcadero® Delphi® 2010 Version is not on the list of products on Embarcadero page. Or is it still possible to buy it? Will RAD Studio XE compile programs written in Delphi 2010 without problems.? Thanks. Am 13.09.2010 09:04, schrieb Petra Nemec: > Will RAD Studio XE compile programs written in Delphi 2010 without problems.? As always you will probably have to recreate the projects as the import is still a bit -- special. Christian Hello! Does anybody know if it is still possible to get a Delphi2010 trial version (if yes where)? ...

Delphi 5 to Delphi 6 and up
Dear List, Trying to add 7Zip compression support to my delphi application. I am using the ported 7Zip sdk (see their website, they have a link). I am stumped on how to rewrite a single function: function ReverseDecode(var Models: array of SmallInt; ....): ..... where the input is mostly a fixed size array of SmallInt. This code perfectly compiles and functions in Delphi 6 and up, but in Delphi 5 I get the error: There is no overloaded version of 'ReverseDecode' that can be called with these arguments And obviously, the input (fixed) isn't the same as the param definition (dynamic sized). However, my question is just as obvious: How do I rewrite this function so it will behave correctly in Delphi 5? (If this is even possible) I hope I don't have to overload it to something like: function ReverseDecode(var Models: array[0..xxx] of SmallInt....... Thanks in advance for any assistance, Rory Rory Slegtenhorst wrote: > Dear List, > > Trying to add 7Zip compression support to my delphi application. > I am using the ported 7Zip sdk (see their website, they have a link). > > I am stumped on how to rewrite a single function: > > function ReverseDecode(var Models: array of SmallInt; ....): ..... > > where the input is mostly a fixed size array of SmallInt. > This code perfectly compiles and functions in Delphi 6 and up, but in > Delphi 5 I get the error: There is no overloaded version...

Delphi 2007 to Delphi 7
I've written a class in Delphi 2007 that is not supported in Delphi 7. What would be the best way to achive what I've done in Delphi 2007 in Delphi 7? Thanks, Tom type BondConstants = class { Bond Types } type BondType = record const TREASURY = 3; AGENCY = 0; CORP = 1; MUNI = 2; SBA = 5; MBS = 4; CMO = 6; end; { Day Count Methods } type DayCount = record const ACTUAL_360 = 2; ACTUAL_365 = 1; ACTUAL_ACTUAL = 1; d30_360 = 0; ...

Delphi 7 to Delphi XE2
Hi, Still using that old workhorse, Delphi7, but am going to the conference in London hosted by Embarcadero on Delphi XE2. Although I would like to "move with the times" and am keen to get the UNICODE and 64-bit support offered by the latest IDEs, I confess to being more than a little scared about all the UNICODE/String/AnsiString and 32/64 bit issues I'm probably going to fall over. Anyone recently upgraded from Delphi7 to one of the latest Delphi IDEs? Thanks, Alain On 03/02/2012 08:55, Alain Dekker wrote: > Still using that old workhorse, Delphi7, but am going to the conference in > London hosted by Embarcadero on Delphi XE2. > > Although I would like to "move with the times" and am keen to get the > UNICODE and 64-bit support offered by the latest IDEs, I confess to being > more than a little scared about all the UNICODE/String/AnsiString and 32/64 > bit issues I'm probably going to fall over. Anyone recently upgraded from > Delphi7 to one of the latest Delphi IDEs? I recently upgraded a sizeable (Paradox) app from D3 to XE2 and was pleasantly surprised. About 20-30 hours once I understood how XE2 works. Andrew -- Andrew Gabb email: agabb@tpgi.com.au Adelaide, South Australia phone: +61 8 8342-1021 ----- Recently moved a lexicographic application from D2007 to XE2 with little pain. As you would imagine, it is heavily string-based, with much use of TStringLists, cuttin...

Delphi and virus, or virus and Delphi.
Hi all. There is some discussion about a 'new' virus, that targets Delphi (and developers). The article is in danish: <http://www.version2.dk/artikel/11833-delphi-udviklere-jages-af-ny-type-malware> but refers to this article: <http://news.cnet.com/8301-27080_3-10312628-245.html> From the Danish article POV, it seems like Delphi itself is vunerable, which is not true. As far as i can see, is the attack vector, injection of (source) code in the 'Sysconst' unit. What's going on? -- Best regards Stig Johansen Perhaps checking other thre...

Delphi and PDF
Hello to all, Anyone knows if there is free or cheap PDF form filler for Delphi 2006 or 2010 Will be good if VCL, otherwise DLL OK too. Thanks in advance > {quote:title=Adam Allen wrote:}{quote} > Anyone knows if there is free or cheap PDF form filler for Delphi 2006 or 2010 What do you call "form filler"? That is a PDF viewer? See http://codenewsfast.com/isapi/isapi.dll/thread?id=09C15F45&thread=1730221 Arnaud BOUCHEZ wrote: > > {quote:title=Adam Allen wrote:}{quote} > > Anyone knows if there is free or cheap PDF form filler for Delphi &...

Convert TSDFDataSet to Delphi 2009 ( read Text file like dataset )
Hi I'm looking TSdfDataSet for Delphi 2009 With Delphi 2007 , work all fine , but in Delphi 2009 i see only first char in every fields Somebody have already update this src ? Important point of src : unit SdfData; //----------------------------------------------------------------------------- { Unit Name : SdfData Application : TSdfDataSet TFixedFormatDataSet Components Version : 2.04 Author : Orlando Arrocha email: oarrocha@hotmail.com Purpose : This components are designed to access directly text files as database tabl...

Web resources about - How do you extract text from a PDF file in Delphi ? - embarcadero.delphi.non-tech

Extracts from the Film A Hard Day's Night - Wikipedia, the free encyclopedia
Extracts from the Film A Hard Day's Night is an EP by The Beatles released on 4 November 1964 by Parlophone (catalogue number GEP 8920.) It was ...

Video 2 Photo - extract still pictures from movies on the App Store on iTunes
Get Video 2 Photo - extract still pictures from movies on the App Store. See screenshots and ratings, and read customer reviews.

Vanilla extract ready to sit - Flickr - Photo Sharing!
You aren't signed in Sign In Help Home The Tour Sign Up Explore Explore Home Last 7 Days Interesting Popular Tags Calendar Most Recent Uploads ...

Garcinia Cambogia Extract Exposed: Side Effects and Warnings - YouTube
3 tips to follow before purchasing garcinia cambogia for smart buyers: 1. Make sure the brand has Hydroxycitric acid in it's formula (at least ...

Gideon Haigh book extract: Certain admissions
Speak of meeting &quot;under the clocks&quot; and no Melburnian mistakes your meaning. The indicator clocks over the archway entrance to Flinders ...

Time to extract ourselves from that futile war on IS
The idea that Australia should decide to participate in dropping bombs on Syria is truly appalling.

Read an extract of Derek Pedley's book of suburban lust, greed and murder in Dead By Friday
BOOK EXTRACT: DEAD By Friday, tells the shocking true story of a father's role in a murder plot. Contains graphic content

Thai police extract $400,000 diamond from jewellery thief’s bottom
A POLICE investigation in Thailand has literally gotten to the bottom of the theft of valuable diamond.

An extract from Dancing with a Cocaine Cowboy
Robyn Windshuttle recalls her long affair with a man who was charming, charismatic ... and a major cocaine dealer.

Extract from Hannie Rayson's 'Hello Beautiful!': When much was unmentionable and toilet rolls were unseen ...
One of the great mysteries of my childhood was a phenomenon known as 'women's problems'.

Resources last updated: 3/1/2016 10:02:16 AM