Roney Fraga Souza

Professor Data Scientist

I like to investigate economics issues using quantitative methods and computation. My main line of research is technological forecasting via curriculum, articles and patent analyzes. I have great interest in network analysis and big data.


  • Introduction to Microeconomics
  • Microeconomics III
  • Economics and Technology
  • Data Science for Economists
  • Data Analysis in R (extension course)


2014/2 - 2015/1

University of Porto

Porto, Portugal.


2012 - 2015

University of Campinas

Economic Development, Campinas, Brazil.

Federal University of Pernambuco

Recife, Brazil

2010 - 2012

Federal University of Mato Grosso

Economic Development, Cuiabá, Brazil.

2004 - 2009

Federal University of Mato Grosso

Economics, Cuiabá, Brazil.


Detection of emerging research lines in networks of scientific publications on bioenergy

Scientific research networks have been used for map emerging research areas in bioenergy studies. Approximately 70,000 scientific articles were analyzed. Nine emerging groups of scientific papers have been identified, all of them related to third-generation biofuel production technologies. Biodiesel from algae was the most prominent biofuel. The most cited areas of knowledge are metabolic engineering, microbial engineering, cyanobacteria engineering and genetic engineering, associated with biochemical and molecular analyzes. Areas of prominence in the third-generation biofuel production process indicate this new field of knowledge emerges since these technologies are not linked to first and second-generation biofuel production technologies.

Is entrepreneurship an emerging field of research?

Seeking to identify if entrepreneurship is an emerging research field, we used an unsupervised method to map the connections of approximately 30 000 scientific articles. The results show that entrepreneurship is not an emerging field of research, but rather a mature field of research. Being the studies on the entrepreneurial role of the universities the most recent field of research that was developed in the literature on entrepreneurship. Studies on entrepreneurship in urban spaces were a new field of research found. Finally, the studies on entrepreneurship in family firms and experiences of entrepreneurial success presented themselves as the most dynamic research areas among the found ones.

Intellectual Property, Innovation and Development

The study Intellectual Property, Innovation, and Development: Challenges for Brazil, investigated the importance of intellectual property for the Brazilian economy and the theoretical discussions on the subject. In examining the granted patents, it has been found that public universities continue to play a relevant role in patent registration. Which, in turn, means that Brazilian companies innovate little, consolidating the scenario of commodity production and distancing us from the key areas of industry 4.0. The delay in examining patent applications, on average 10 years, is one of the bottlenecks detected. The lag in the patenting process contributes to a distancing of the Brazilian industry in the world market. By analyzing in a pioneering way the patent information registered in the Lattes curriculum platform, researchers with an academic profile are responsible for two-thirds of patents granted in Brazil, and among this 84% have high academic productivity in academic circles, with an average of 27 articles published in scientific journals. Evidence that there is no dichotomy between academic production and generation of intellectual property. LINK


Data Science for technology forecasting via analysis of: curriculum, articles and patents

BirdDog is a technology forecasting project which maps fields of knowledge by different dimensions. It is used for reading of curriculum of the platform lattes to find which professionals work with a certain area of knowledge, considering the weight of the publications and the relations of co-authorship. With articles it is possible to use data obtained from databases such as Scopus and WoS Web of Science to build networks of scientific publications, group articles by similarity of connections, calculate topological means of importance of each article, extract the content of groups with NLP Natural Language Processing, among other procedures. The same procedures applied to scientific articles are finally applied to patents, where it is possible to find the frontier of knowledge in the world of patents (USPTO). All the procedures used are composed by unsupervised methods, which allow the applications of thousands of authors, articles and patents, and spend little processing time.

The most stable and cheap way to organize multiple computers to solve the same problem is with Linux. For data processing I use computers with Manjaro, and for web server I use Ubuntu. In my day to day I use a Manjaro+i3wm.


Working with large amounts of data makes it necessary to store large amounts of data. The solution is to use an operating system intended for network data storage. My life is made easier by the security and functionality that FreeNAS provides.


R is a statistical software/programming language which allows you to analyze data in a proficient way with a few lines of code. The great availability of packages, free software, community and flexibility are the strongest points to choose R as a work tool.


Database knowledge allows you to allocate large amounts of data without overloading the R memory. In a test environment and/or development, SQLite is an excellent alternative to aid in the flow of analysis and data processing.


The use of regular expressions allows you to treat data in text format and it assists in extracting content from articles and patents.


I tried to find a way to work that allows me to: Faced with these constraints, the path I have chosen is efficient for my needs, but it is not attractive to most people.
I use a maximized terminal window, in Mac I use iTerm2, running with Tmux. Using terminal + Tmux allows connecting to other computers viassh, and Oh My Zsh makes life more agile. neovim is the text editor I use to edit any type of text file. I use several plugins in Oh My Zsh,Tmux and neovim. All colors are Solarized Dark orSolarized Light. To manage files I use ranger.

The main methodology guiding my work is the network analysis. The basic texts of the network analysis I recommend are:

I use NLP Natural Language Processing to analyze the contents of articles and patents. The idea is to obtain the content of a set of documents, via language filters to return candidate terms, without the need to read those documents. The analysis of the importance of these candidate terms is done by metrics such as tf-idf.


