Development

Start: 04/2022.

End: 06/2023.

Project description (MBA Page)

fgv

Big Data is the term used to describe the vast volume of data that impacts business on a day-to-day basis. The Executive MBA in Business Analytics and Big Data enabled us to analyze business problems and use analytical techniques in this current scenario characterized by complexity, diversity and high volume of digital data.

During the course we learned about:

  • Analytical ability: To manage and conduct projects involving structured and unstructured databases;
  • Business management: Ability to analyze and identify problems and generate solutions to business problems, based on the practical application of methods;
  • Analytical Problem-Solving: Possessing extensive expertise in data modeling, quantitative analysis, and problem identification/resolution, enabling effective decision-making and driving successful outcomes.
  • Analytical Project Management: Proficient in understanding the benefits, challenges, and risks associated with analytical projects, ensuring successful project execution and delivering valuable insights.
  • Analytical Tools Proficiency: Demonstrating a comprehensive understanding of the characteristics and requirements of key techniques and analytical tools used in formulating, modeling, and analyzing databases.
  • Statistical Analysis: Fundamentals of statistical analysis and computational methods for data analysis in organizations;
  • Proficiency in Handling Databases: Familiarity with the characteristics and requirements of various techniques used to manage structured databases, as well as distributed and large-volume databases

Topics

  • Controladoria Gerencial (Managerial Controlling)
  • Aplicações em Decisões Mercadológicas (Applications in Marketing Decisions)
  • Desafios de Projetos Analíticos (Challenges in Analytical Projects)
  • Decisões Empresariais e Raciocínio Analítico (Business Decisions and Analytical Reasoning)
  • Análise Exploratória de Dados (Exploratory Data Analysis)
  • Modelagem Informacional (Information Modeling)
  • Banco de Dados e Visualização (Database and Visualization)
  • Banco de Dados distríbuidos (Distributed Databases)
  • Inferência Estatística (Statistical Inference)
  • Modelagem Estatística Avançada (Advanced Statistical Modeling)
  • Métodos Matriciais e Inferências de Clusters (Matrix Methods and Cluster Inferences)
  • Estatística Espacial (Spatial Statistics)
  • Aplicações de Estatística Espacial (Applications of Spatial Statistics)
  • Análise Preditiva (Predictive Analysis)
  • Análise Preditiva Avançada (Advanced Predictive Analysis)
  • Análise de Séries Temporais (Time Series Analysis)
  • Análise Econômica e Geração de Valor (Economic Analysis and Value Generation)
  • Análise de Mídias Sociais e Mineração de Texto (Social Media Analysis and Text Mining)

Database and Visualization with Tableau

(Project 1)

(Project 2)

Screenshot

Exploratory data analysis report to identify the 50 most and least violent cities in the US and explain their factors, among other data exploration analysis.

Statistical Inference report to study the variable Gross Production of the United States (USA). From it, we filter the indices that make up the country's economy and constructed a linear model.

Matrix Methods and Cluster Inferences report using a)PCA b)K-Medoid c)DBSCAN d)Cluster silhouette analysis e)Variable distribution per cluster.

Spatial Statistics report to study and model the spatial behavior of three types of violations committed in the city of Philadelphia.

Applications of Spatial Statistics report to study the number of deaths from COVID-19 up to 12/30/2020 in the municipalities of Rio de Janeiro state and create a model to identify whether the cause is spatially relevant.

Predictive Analysis report to classify and predict with real FIES data when a student is a good or bad payer using just logistic regression methods. Our best prediction model used the logistic regression model with Ridge regularization and threshold of 0.4. Obtaining a recall of 0.88.

Advanced Predictive Analysis report to identify fraud in an unbalanced dataset, applying oversampling techniques such as SMOTE and testing various models such as: Random Forest, XGBoost, Naive Bayes, Gradient Boost, LGBM, KNN and Neural Network. The best model was the Bagging classifier with a recall of 0.216 and the best financial return.

Time Series Analysis report to predict passenger behavior in air transport using ARIMA.

Wordcloud and graph from Media Analysis collecting BNDES news from G1 website.

Tools

  • Google Colab
  • Python
  • R
  • Tableau
  • Gephi
  • SQL
  • Machine Learning
  • Time-Series