BusinessObjects Board

How to avoide Duplicate using Text Analysis data source?

Hello experts,

I’m developing a universe for analyzing data . As you maybe know BO Text Analysis can crawl unstructured documents or web sites. Using such data source you can have duplicate entries in the database.

I’ve read that the best practive is to use ETL-Tool to remove those duplicates. At designer level using count distinct could be one option. Is there any other methods of reducing duplicate rows in designer or at report level?

Thanks
Lamine


lamso (BOB member since 2008-04-03)

When you say duplicates, what exactly do you mean? If you have two rows for the same employee with the salary for all the months, then you can bring out the salary of the recent month at universe level and at report level too.


Jansi :india: (BOB member since 2008-05-12)

Hello Jansi and all,

thanks for helpful information. I have 2 tables. The first table phrase_type with following information: positiv, negativ, neutral sentiment. Second table contains phrases which has been exracted from different data sources such as text documents, web sites, etc. In the phrase table I have redundant entries. For instance in the phrase table “I like lady gaga songs” is showing 3 times. If the same phrase_type_id and the same phrase appear many times I would like to only one of them. Please see the attached picture.

Thanks
Lamine
TA VOC.JPG


lamso (BOB member since 2008-04-03)

In your picture, what does the ID represent? The uniqueness in that column is what’s causing the duplicates.


digpen :us: (BOB member since 2002-08-15)

Hello,
ID is the primary key of the table phrase.


lamso (BOB member since 2008-04-03)

Hello,
I would like create a report like this:

URL-------------------------Phrase----------------------------Type

Twitter.com I like lady gaga songs Positiv Sentiment
I don’t like lady gaga Negativ Sentiment

Facebook I like lady gaga songs Positiv Sentiment
I don’t like lady gaga Negativ Sentiment

If you look at the table (see table on the left side of the picture), the phrase “I like lady gaga songs” is showing 3 times. I would just like to show it 1 time.


lamso (BOB member since 2008-04-03)

Hi,

As per my understanding, the second table doesn’t maintain any uniqueness and also is not referencing any values from first table.

You have to select distinct phrase and type by joining both tables on primary key \ foreign key relation (if phrase_type_id in second table is foreign key of first table)

Attached test script may help
duptest.txt (0.0 KB)


maddyk (BOB member since 2011-02-08)