The Observer Effect in Quantum Physics establishes that the very act of looking at a very small particle influences its behavior. Hence, generating perfect data becomes impossible, as it is not known whether the show is the same in solitary rehearsal or under the attentive scrutiny of the audience.
This was one of the challenges that a group of students set out to solve when doing the project “Analysis of sentiment based on Twitter posts and their interrelation with the price of crypto-assets”.
The study was entered in a competition organized by Fundação Getúlio Vargas and placed third. The team was formed by Felipe Gabriel (UFSC), Guilherme Terriaga (UMC), Matheus Konstantinidis (UFSC), Pedro H. Anjos (UFSC) and Vinícius Custódio (UDESC).
The objective was to understand the correlation: does the price of bitcoin go up or down because of tweets, or are the tweets made by the fact that prices fluctuate?
“In extracting data, our premise was: the market is made by people, people are impacted by feelings; feelings affect the market in any way? So we chose Twitter to ‘mine’ these ontologies that represent something, based on the principle that every sentence has a meaning and is an expression of something, in this case, feelings”, says Guilherme Terriaga, a student at UMC in an interview with Portal of Bitcoin.
Leaving the theoretical field and going to the practical, Terriaga explains that the group used a technique called web scraping to capture the tweets and an API of the social network itself to collect some data regarding the volume of publications.
“So in web scraping, doing a search for ‘Bitcoin’ and ‘BTC’, in English tweets, we generate a dataset and process it, making the data more cooked for analysis,” he said.
The feelings of each tweet
The part of data processing after collection was explained by Felipe Gabriel, from the Federal University of Santa Catarina. The student says that the group assigned a sentiment score to each tweet.
This sentiment score was given by a parameter named polarity, which is provided by a python module called TextBlob.
“We did this for all the tweets on a given day and averaged the comments, retweets and likes of those sentiment scores for the day, to account for the ‘impact’ of the tweet shall we say. Then, after that, we aggregate these weighted averages into a sum and treat it as a single indicator, then correlating it with the price to try to validate it”, says Gabriel.
But was there any correlation?
“Yes, we got a correlation of .77 [ de um indicador que vai até 1] in the last four years so far. It was very expressive, however, in long moments of market lateralization, it drops, making the market more ‘rational’, as social networks are less busy – this is our hypothesis”.
The Egg and the Chicken
But how to know which came first, the egg or the chicken, or even, for those who are older, if the cookie is “fresh because it sells more or sells more because it is fresh”.
“We did some correlation tests on moving windows, however the judges recommended us, to avoid the so-called ‘spurious regression’, to use some statistical tests such as the Durbin-Watson, which serves to see if it was not just coincidence that the results give so right. In addition, we sought to see if the conclusions of the analyzes are maintained if we observe the series that are more time-displaced, that is, to see if the value of our indicator in the previous week still had an impact on the price”, says Gabriel.
The result of the group work is software that is currently only for academic use by them, but plans to release a version for commercial purposes.