Hi ,
I am planning to extract the twitter data using user defined transform. I am using the blue prints job. But I need to extract the some other additional fields like followers_count, retweet_count,friend_count etc.
I have modified the code by adding those fields in the python script as shown below.
tweet information
for innerIndex in (range(len(search_results))):
sResult = search_results[innerIndex]
id = sResult[u’id’]
if str(id) <= self.max_id:
raise SearchError(‘Stopping the search for this term due to repeated data being found.’)
if min_id is None:
min_id = long(id)
else:
if min_id > long(id):
min_id = long(id)
next_max_id = long(min_id) - 1
created_at = sResult[u’created_at’]
if created_at is None:
continue
created_at = time.strftime(’%a, %d %b %Y %H:%M:%S +0000’, time.strptime(sResult[u’created_at’], ‘%a %b %d %H:%M:%S +0000 %Y’))
user = sResult[u’user’]
metadata = sResult[u’metadata’]
from_user = user[u’name’]
if from_user is None:
continue
from_user_id = user[u’id’]
from_user_name = user[u’screen_name’]
#friend_count = user[u’friend_count’]
favourite_count = sResult[u’favourite_count’]
#followers_count = user[u’followers_count’]
retweet_count = sResult[u’retweet_count’]
if sResult[u’geo’] is None:
coordinates_lat = ‘’
coordinates_long = ‘’
coordinates_type = ‘’
else:
geo_result = sResult[‘geo’]
coordinates_lat = geo_result[‘coordinates’][0]
coordinates_long = geo_result[‘coordinates’][1]
coordinates_type = geo_result[u’type’]
id_str = sResult[u’id_str’]
iso_language_code = metadata[u’iso_language_code’]
if ‘place’ not in sResult or sResult[u’place’] is None:
place = None
else:
place = sResult[‘place’][u’full_name’]
text = sResult[u’text’]
text = text.replace(’\n’, ‘’)
if len(text) > 300:
continue
try:
DSRecord = DataManager.NewDataRecord(1)
DSRecord.SetField(u’MAX_ID’, unicode(max_id))
DSRecord.SetField(u’CREATED_AT’, unicode(created_at))
DSRecord.SetField(u’FROM_USER’, unicode(from_user_name))
DSRecord.SetField(u’FROM_USER_ID’, unicode(from_user_id))
DSRecord.SetField(u’FROM_USER_ACCOUNT’, unicode(from_user))
DSRecord.SetField(u’COORDINATES_LAT’, unicode(coordinates_lat))
DSRecord.SetField(u’COORDINATES_LONG’, unicode(coordinates_long))
#DSRecord.SetField(u’FRIEND_COUNT ‘, unicode(friend_count))
DSRecord.SetField(u’FAVOURITE_COUNT ‘, unicode(favourite_count))
#DSRecord.SetField(u’FOLLOWERS_COUNT ‘, unicode(followers_count))
DSRecord.SetField(u’RETWEET_COUNT ‘, unicode(retweet_count))
DSRecord.SetField(u’ID_STR’, unicode(id_str))
DSRecord.SetField(u’LANGUAGE’, unicode(iso_language_code))
DSRecord.SetField(u’TEXT’, unicode(text) if len(text) <= 2000 else unicode(text[:2000]))
DSRecord.SetField(u’LOCATION’, unicode(place))
DSRecord.SetField(u’SEARCH_TERM’, unicode(self.term))
DSRecord.SetField(u’CHANNEL’, unicode(‘t’))
DSRecord.SetField(u’PROXY’, unicode(self.proxy))
Collection.AddRecord(DSRecord)
del DSRecord
But while executing the job, I am getting the below error.
6844 7828 DQX-058306 10/23/2016 5:45:00 PM |Sub data flow TdpBlueprintEn_Twitter_Search_V1_1_2|Transform Search_Twitter
6844 7828 DQX-058306 10/23/2016 5:45:00 PM Transform <Get_Search_Tasks>: Traceback (most recent call last):
6844 7828 DQX-058306 10/23/2016 5:45:00 PM File “EXPRESSION”, line 402, in
6844 7828 DQX-058306 10/23/2016 5:45:00 PM File “EXPRESSION”, line 286, in search
6844 7828 DQX-058306 10/23/2016 5:45:00 PM KeyError: u’favourite_count’.
6844 7828 DQX-058306 10/23/2016 5:45:00 PM |Sub data flow TdpBlueprintEn_Twitter_Search_V1_1_2|Transform Search_Twitter
6844 7828 DQX-058306 10/23/2016 5:45:00 PM Transform <Search_Twitter>: : Error executing the expression.
6844 7828 DQX-058302 10/23/2016 5:45:00 PM |Sub data flow TdpBlueprintEn_Twitter_Search_V1_1_2|Transform Search_Twitter
6844 7828 DQX-058302 10/23/2016 5:45:00 PM Transform <Search_Twitter>: DLL <udt_transformu.dll> runtime function failed with error <3>. More detailed
6844 7828 DQX-058302 10/23/2016 5:45:00 PM information may be obtained from previous errors.
7792 6244 DFC-250038 10/23/2016 5:45:17 PM |Dataflow TdpBlueprintEn_Twitter_Search_V1
7792 6244 DFC-250038 10/23/2016 5:45:17 PM Sub data flow <TdpBlueprintEn_Twitter_Search_V1_1_2> terminated due to error <58302>.
Please help me to solve this issue.
Thanks & Regards,
Ramana.
Ramana (BOB member since 2009-04-30)