Sentimental Analysis on Baidu-Aip
For Baidu has used deep learning to train a sentiment analysis model, it is wise to deploy its well-behaved model for financial sentiment analysis.
To do this, there are several steps.
Preparation
The first step is to construct you raw data(read the article there on how to use web crawler: Webcrawler). There I have already used web crawler to obtain more than 500k data about comments from EastMoney Website, thus I would just easily applied Baidu Cloud API to analyse my data.
Then, go to the official website of Baidu Cloud: 情感倾向分析_情感倾向分析算法-百度AI开放平台 (baidu.com) and register for a new account while receving 500k free requests.
Usage of SDK
SDK is an easy method to deploy Baidu-Aip API. Open CMD in windows and type pip install baidu-aip
.
If there is no chardet library installed, then you shall install it first by typing pip install chardet
.
After this create a new python file and import Baidu Cloud API:
from aip import AipNlp
import pandas as pd
""" generate a server instance """
APP_ID = '...'
API_KEY = '...'
SECRET_KEY = '...' #can be obtained from controller of Baidu Account.
client = AipNlp(APP_ID, API_KEY, SECRET_KEY)
"""start analysis"""
df_source=pd.read_csv('...') # there is the file path of your raw data
df_source = pd.concat([df_source, pd.DataFrame(columns=['content2'])], sort=False)
df_source = pd.concat([df_source, pd.DataFrame(columns=['items'])], sort=False)
df_source["title+content"]=df_source["title"]+', '+df_source["content"]
df_source['title+content']=df_source['title+content'].fillna('特殊占位符,用来填补空值')
df_test=df_source
index = 0
for i in range(2153,len(df_test["title+content"])):
global_df = pd.DataFrame(columns=['title', 'content', 'time', 'sentiment'])
text_1=df_test.loc[i,"title+content"]
print(text_1)
result=client.sentimentClassify(text_1)
title=df_test.loc[i,"title"]
content=df_test.loc[i,"content"]
time = df_test.loc[i,"time"]
index = i+1
senti = result['items']
global_df.loc[len(global_df)] = [title,content,time,senti]
global_df.to_csv(f'D:/comments_temp/{index}.csv',encoding='utf8',index=False) #Constant save, or you will fail
Tips: for free usage everyone has a 2 quests per second quota, this is unbearable for a large data set. To extend it, you can just pay 1 yuan and choose to agree ‘pay for quests’ to increase your quota to 20 quests per second. There is no need to use any sleep function in the for loop while the qps is 20.
Attention: if there raised error as json.decoder.JSONDecodeError, this is because the connection is failed. To solve the problem you should refer to json library: import json
, and use try..except to have a second try on connecting to Baidu Server:
for i in range(60891,len(df_test["title+content"])):
text_1=df_test.loc[i,"title+content"]
print(text_1)
try:
result=client.sentimentClassify(text_1)
except json.decoder.JSONDecodeError:
result=client.sentimentClassify(text_1)
printf:('connection failed')
Result
the result per request is a dictionary including two keys: text and items, while items is paired with another dictionary. Thus you need to use scripts to rearrange the data frame if you want to.
We should sort all of these data according to the stock code:
import pandas as pd
#先把每种股票都分别单列出来.
df=pd.read_csv("Sentiment.csv")
code_list = set(df[df.columns[0]])
for j in code_list:
df_code = df.loc[df[df.columns[0]]==j]
df_code.to_csv(f"stock_code/{j}.csv",encoding="utf-8-sig",index=False)
Further Process see: deal with data