I’m developing a universe for analyzing data . As you maybe know BO Text Analysis can crawl unstructured documents or web sites. Using such data source you can have duplicate entries in the database.
I’ve read that the best practive is to use ETL-Tool to remove those duplicates. At designer level using count distinct could be one option. Is there any other methods of reducing duplicate rows in designer or at report level?
When you say duplicates, what exactly do you mean? If you have two rows for the same employee with the salary for all the months, then you can bring out the salary of the recent month at universe level and at report level too.
thanks for helpful information. I have 2 tables. The first table phrase_type with following information: positiv, negativ, neutral sentiment. Second table contains phrases which has been exracted from different data sources such as text documents, web sites, etc. In the phrase table I have redundant entries. For instance in the phrase table “I like lady gaga songs” is showing 3 times. If the same phrase_type_id and the same phrase appear many times I would like to only one of them. Please see the attached picture.
Twitter.com I like lady gaga songs Positiv Sentiment
I don’t like lady gaga Negativ Sentiment
Facebook I like lady gaga songs Positiv Sentiment
I don’t like lady gaga Negativ Sentiment
If you look at the table (see table on the left side of the picture), the phrase “I like lady gaga songs” is showing 3 times. I would just like to show it 1 time.
As per my understanding, the second table doesn’t maintain any uniqueness and also is not referencing any values from first table.
You have to select distinct phrase and type by joining both tables on primary key \ foreign key relation (if phrase_type_id in second table is foreign key of first table)
Attached test script may help duptest.txt (0.0 KB)