BusinessObjects Board

Reading matrix/table from PDF?

Hi all,

I have the following use case that I would like to ask your help with.

I have a product pricelist, in PDF form, that we need to read in using SAP Data Services 4.2 (SP07).
There are some introduction pages but then the product and prices are all in a matrix / tabular format for the following 20 pages or so.

I can read the PDF into Data Services as unstructured text. If I then use Text Data Processing, I can extract all the products (we have a custom dictionary for this as well).

However, the Entity Extraction process seems to ignore the price data completely?

Is there any way that I can get the Entity Extraction process to pick up the prices as well and associate this information with the product?
(With the product being referenced as the topic related to the prices?)

I’ve used SAP Data Services successfully to process other unstructured text, such as Tweets and emails, but it seems that it struggles more with “semi-structured” text than actually plain sentences?

Any help would be much appreciated!


ErikR :new_zealand: (BOB member since 2007-01-10)

No one ever had to read in PDF files with table-structured data? Ever? :expressionless:


ErikR :new_zealand: (BOB member since 2007-01-10)