45 Big Data Terms To Have At The Tip Of Your Tongue
Big Data is a term that serves as an umbrella to encompass many other related terms that, although important, are not in everyone’s day-to-day life. With that in mind, we have created this list with 45 terms related to Big Data for you to expand your knowledge and consult whenever you need to.
Did you miss a word? Get in touch! The list below is constantly expanding, and it would be great to have your help making it an even more powerful and valuable source of information.
The Terms:
- Big Data: name was given to the large volumes of data available on the internet, increasing every second. IDC estimates that by 2025, the volume of data worldwide will reach 175 zettabytes. To put that size in perspective, if you tried to download a 175 zettabyte file, it would take about 1.8 billion years.
- Data Mining (data mining): is the process of extracting information from large volumes of data in search of patterns and valuable information.
- Machine Learning: These are computer algorithms that can improve themselves automatically through experience and the use of data.
- Web crawler digital robots scan websites or digital databases faster than humans. They can deliver and update information with high accuracy in real-time.
- Web scraper: digital robots that collect more specific data than web crawlers. Similar to web crawlers, they also capture data at high speed and with high levels of accuracy, making the automation of data collection complete and relevant to the purposes of companies.
- Price Scraping: is the extraction of data on product prices on eCommerce sites. It can be done in real-time with the use of web scrapers.
- Captcha Captcha: a type of cognitive challenge mainly used as an anti-spam tool. Do you know those tests on some websites asking you to identify vehicles, traffic lights, or pedestrian crossings? This is an example of Captcha.
- Proxy: refers to proxy servers, which are used in automated data collection to prevent bots from being blocked from requesting information from the sites from which the data will be collected.
- Artificial Intelligence (AI) are systems capable of imitating human intelligence in executing tasks. AI has a variety of practical uses in everyday life, such as in search engines, online advertisements, content recommendation systems, virtual assistants, facial recognition, spam filtering, and autonomous vehicles.
- Data Engineering: is the activity that involves, among other things, the collection, translation, and validation of data for subsequent analysis.
- Data Science Data Science: a set of strategies, tools, and processes used to obtain precise and quality insights from Big Data information. With it, companies can identify opportunities more quickly, discover valuable talent, and win and retain more customers, among other advantages.
- Data Analytics (data analysis) is the process of analyzing data in search of useful information for the organization’s objectives. It is an activity that needs to consider details such as metadata, the dependencies between data, and the relationships between data and the real world.
- Data-Driven refers to organizations that routinely rely on data in decision-making. Companies that make data-driven decisions have more revenue, better serve customer desires and become more profitable.
- Python: is a high-level programming language released in 1991. It is used in the development of web crawlers and also in creating AI applications.
- ETL (extract, transform, load): the acronym refers to extracting, transforming, and loading data from multiple sources and storing it.
- SQL: acronym for Structured Query Language. It is a declarative search language used by databases.
- CSV: is a file format in which commas separate data.
- JSON: JavaScript Object Notation. It is one of the most common formats for delivering structured data after the collection and scraping work is done.
- API: Application Programming Interface, or application programming interfaces, are essential for the communication between two different applications to be established, enabling the exchange of information between them and the collection of data by bots.
- Data Lake is a repository capable of storing both structured and unstructured data. It brings together, in a unified and easily accessible location, the various information in the most different formats that organizations collect to help guide their strategies and decisions.
- Data Swamp is a data lake with no quality control and is full of unorganized and unstructured data.
- Cloud Computing (cloud computing): is the technology that allows access to computing services remotely through the internet. It can be used to analyze data, store data, and make backups, among other uses.
- Algorithm: in computer science, this is a sequence of actions or instructions to find a solution to a given problem.
- Data Transformation
- Data Warehouse: a data management system aggregates data from different sources and goes beyond supporting activities such as analysis and data mining.
- Natural language processing (NLP): an area of computing that helps computers better understand human language.
- No-SQL: are database management systems that do not use, or that do not use only, the relational tables that are most commonly used.
- Software-as-a-Service (SaaS): this translates as “software as a service” and is the provision of software services over the internet. SaaS includes Google Drive, Dropbox, Slack, Salesforce, and Mailchimp.
- Structured Data: data that conforms to a certain pre-established data model. This type of data also has a well-defined structure and can be easily accessed by people or programs.
- Unstructured Data is data that does not have a pre-defined data model, i.e., that is not organized in a pre-defined way. They can be texts, videos, emails, photos, audio files, or web pages.
- Data Center: is a data center where services such as storage, management, backup, and data recovery, among others, are performed.
- LGPD is the General Law on Data Protection, which came into force in 2018 and amended two Marco Civil da Internet articles.
- GDPR: General Data Protection Regulation. It is the European Union’s regulation of data protection and privacy. It was established in 2016, and in 2018 it became applicable.
- IoT It refers to objects such as TVs, watches, and cars that have internet access and constantly generate new data.
- Business Intelligence Business Intelligence: translated into Portuguese as Market Intelligence, is a concept directly related to data collection and study, generating significant market gains for companies in all segments.
- RPA Robotic Process Automation. It is the process of automating, with the use of virtual robots, repetitive activities within a company.
- Open Data: the idea that some data should exist securely in a shared form within a data ecosystem. A practical example is open banking.
- Metadata: it is the definition of the description referring to specific data.
- Intelligent pricing is based on statistics and algorithms that use up-to-date information on sales, costs, visits, and stock.
- Actionable data: this is data that, once collected and processed, is effectively used to precisely define the company’s next steps.
- JSON: an acronym for JavaScript Object Notation, it is a standard for fast and simple data exchange.
- Data cleansing: the name given to the process of removing corrupt, incomplete, or duplicate data.
- Data aggregation: is the name of the process in which databases are compiled and combined for further processing.
- Data governance: is a set of practices and solutions related to the management and use of data. The idea of governance is to extract maximum value from data to benefit an organization.
- Hyper Automation Is the combination of Artificial Intelligence and RPA tools, among others, to automate any repetitive tasks an organization performs.
Also Read: What Is Big Data, And Why Invest?