Big Data Tools Review – Case Study: Spark Vs Flink

Authors

  • David Francisco Cudijinguissa Instituto Superior Politécnico de Privado do Kilamba Author

DOI:

https://doi.org/10.51473/rcmos.v1i2.2025.1517

Keywords:

Big Data, Análise de Dados, Open Source, Spark, Flink

Abstract

This article aimed to compare the performance of Big Data tools, Spark and Flink, considering five attributes that make these systems highly complex and demanding in terms of processing. As a result, the tools used to work with this data tend to be significantly more robust than conventional ones, often being more expensive as well. Open-source systems provide access to the source code, making it easier for collaborators to understand the systems and algorithms, and allowing them to adapt them according to their project needs. The methodology adopted in this study is based on descriptive research to relate the variables, with an exploratory and explanatory approach of a qualitative nature. A case study is presented, comparing the Spark and Flink platforms, considering factors such as scalability, data storage, complexity, and implementation options.

Downloads

Download data is not yet available.

References

CASTELLS, M. A. Sociedade em rede. v. 1. São Paulo: Paz e Terra, 1999.

FAYYAD, U. M.; PIATETSKY-SHAPIRO, G.; UTHURUSAMY, R. Advances in Knowledge Discovery and Data Mining. USA: MIT Press, 1996.

GAO, J. Z.; XIE, C.; TAO, C. Big data validation and quality assurance issues, challenges, and needs. In: SOSE, p. 433-441. IEEE Computer Society, 2016. DOI: https://doi.org/10.1109/SOSE.2016.63

JOHN, R. M. Big Data, 1.3 X/year CAGR: historical trendline 1.6 X/year since 1990, 2.0 X/year leap 1998/1999, 1998.

LEE, Y. W. et al. A methodology for information quality assessment. Information & Management: Aimq, v. 40, n. 2, p. 133-, 2012. DOI: https://doi.org/10.1016/S0378-7206(02)00043-5

MARCONI, M. de A.; LAKATOS, E. M. Metodologia científica. 7. ed. (atualização João Bosco Medeiros). São Paulo: Atlas, 2017.

MARQUES, A. V. Importância dos big data no setor, 2017.

MICROSOFT. Big Data Architectures. Microsoft Learn, 2025. Disponível em: https://learn.microsoft.com/en-us/azure/architecture/databases/guide/big-data-architectures

. Acesso em: 15 out. 2025.

MIRANDA, J. V. Big Data. Alura, 2023. Disponível em: https://www.alura.com.br/artigos/big-data

. Acesso em: 15 out. 2025.

NARKHEDE, N.; SHAPIRA, G.; PALINO, T. Kafka: The Definitive Guide. Sebastopol, CA: O’Reilly Media, Inc., 2017.

PEREIRA, A. S. et al. Metodologia da pesquisa científica. Santa Maria: UAB/NTE/UFSM, 2018. DOI: http://repositorio.ufsm.br/handle/1/15824

.

PUSHKAREV, V. et al. An overview of open source data quality tools. In: IKE, p. 370-376. CSREA Press, 2010.

RESEARCHGATE. Generic Architecture for a Big Data Analytical. 2021. Disponível em: https://www.researchgate.net/publication/318870641/figure/fig1/

. Acesso em: 15 out. 2025.

TORSTEN, H. Machine Learning & Statistical Learning. 2018. Disponível em: https://pt.wikipedia.org/wiki/Big_data

. Acesso em: 15 out. 2025.

WAMPLER, D. Fast Data Architectures for Streaming Applications. 2. ed. Sebastopol, CA: O’Reilly Media, Inc., 2018.

WITTEN, I. H.; FRANK, E.; HALL, M. A. Data Mining: Practical Machine Learning Tools and Techniques. 3. ed. San Francisco: Morgan Kaufmann Publishers, 2011. DOI: https://doi.org/10.1016/B978-0-12-374856-0.00001-8

Published

2025-10-15

How to Cite

CUDIJINGUISSA, David Francisco. Big Data Tools Review – Case Study: Spark Vs Flink. Multidisciplinary Scientific Journal The Knowledge, Brasil, v. 1, n. 2, 2025. DOI: 10.51473/rcmos.v1i2.2025.1517. Disponível em: https://submissoesrevistarcmos.com.br/rcmos/article/view/1517. Acesso em: 20 oct. 2025.