Fundamentals of Computer Science
Part 1: Understanding theoretical computer science
While you don’t need to be a master mathematician to love computer science, these two subjects are intrinsically tied. Computer science, particularly programming, uses algorithms, which are algebraic in nature. The important point here is that they are mathematical. The logical processes stem from the philosophical nature and history of mathematics. Now, if mathematical topics are not to your liking, don’t despair. The logical processes needed to become a programmer and developer can be used without having to learn higher mathematics. Knowing higher mathematics just simplifies some concepts for those who have that background.
Theoretical computer science includes multiple theories and topics. Some of the topics and theories are listed as follow:
- Algorithms
- Coding theory
- Computational biology
- Data structures
- Cryptography
- Information theory
- Machine learning
- Automata theory
- Formal language theory
- Symbolic computation
- Computational geometry
- Computational number theory
1. Algorithm
An algorithm is a set of instructions that a computer can read. Algorithms provide the rules or instructions in a way that means a computer can logically process the information provided as input and create an output.
2. Coding theory
Coding theory is also sometimes known as algebraic coding theory. When working with code and coding theory, there are three areas that are studied: data compression , error correction , and cryptography
a. Data compression
The importance of data compression cannot be understated. Data compression allows us to store the maximum amount of information while taking up the least amount of space. In other words, data compression uses the fewest number of bits to store the data
As our technology and storage capacities have grown and improved, our ability to store additional data has as well. Historically, computers had kilobytes or megabytes of storage when first introduced into households, but they currently have gigabytes and terabytes worth of storage. The conversions for each of the storage units are shown as follows:
If you look for information online, you may find that some sources state that there are 1,024 gigabytes in a terabyte. That is a binary conversion. In the decimal system, or base-10 system, there are 1,000 gigabytes per terabyte. To understand conversion better, it is important to understand the prefixes that apply to the base-10 system and the prefixes that apply to the binary system:
As mentioned, the goal is always to use the least amount of bits for the largest amount of data possible. Therefore, we compress, or reduce, the size of data in order to use less storage.
So, why is data compression so important? Let’s go back in time to 2000. Back then, a laptop computer on sale for about $1,000 had about 64 MB of RAM (Random Access Memory) and 6 GB of hard drive memory. A photograph on our digital phones takes anywhere from 2 to 5 megabytes of memory when we use its actual size. That means our computers couldn’t store many (and in some cases any) of the modern pictures we take now. Data compression advances allow us to store more memory, create better games and applications, and much more, as we can have better graphics and additional information or code without having to worry as much about the amount of memory they use.
b. Error correction
In computer science, errors are a fact of life. We make mistakes in our processes, our algorithms, our designs, and everything in between. Error correction, also known as error handling, is the process a computer goes through to automatically correct an error or multiple errors, which happens when digital data is incorrectly transmitted.
An Error Correction Code (ECC) can help us analyze data transmissions. ECC locates and corrects transmission errors. In computers, ECC is built into a storage space that can identify common internal data corruption problems. For example, ECC can help read broken codes, such as a missing piece of a QR (Quick Response) code. A type of ECC is a hamming code . A hamming code is a binary linear code that can detect up to two-bit errors.
Another type of ECC is a parity bit . A parity bit checks the status of data and determines whether any data has been lost or overwritten. Error correction is important for all software developed, as any updates, changes, or upgrades can lead to corruption of the entire program or parts of the program or software.
c. Cryptography
Cryptography is used in computer science to hide code. In cryptography, information or data is written so that it is unreadable by anyone other than the intended recipient of the message. In simple terms, cryptography takes readable text or information and converts it into unreadable text or information.
When we think about cryptography now, we tend to think of encryption of data. Coders encrypt data by converting it into code that cannot be seen by unauthorized users. However, cryptography has been around for centuries, that is, it pre-dates computers. Historically, the first uses of cryptography were found around 1900 BC in a tomb in Egypt. Atypical or unusual hieroglyphs were mixed with common hieroglyphs at various parts of the tomb.
The reason for the unusual hieroglyphs is unknown, but the messages were hidden from others with their use. Later on, cryptography would be used to communicate in secret by governments and spies, in times of war and peace. Nowadays, cryptography is used to encrypt data, as our information exists in digital format, so protecting sensitive information, such as banking, demographic, or personal data is important.
3. Computational biology
Computational biology is the area of theoretical computer science that focuses on the study of biological data and bioinformatics. Bioinformatics is a science that allows us to collect biological data and analyze it. An example of bioinformatics is the collection and analysis of genetic codes. In the study of biology, large quantities of data are explored and recorded.
Studies can be wide-ranging in topics and interdisciplinary. For example, a genetic study may include data from an entire state, an entire race, or an entire country. Some areas within computational biology include molecules, cells, tissues, and organisms. Computational biology allows us to study the composition of these things, from the most basic level to the larger organism. Bioinformatics and computational biology provide a structure for experimental studies in these areas, create predictions and comparisons, and provide a way to develop and test theories
4. Data structures
In coding theory, we use data structures to collect and organize data. The goal is to prepare the data so that we can perform operations efficiently and effectively. Data structures can be primitive or abstract. Software has built-in data structures, which are the primitive data structures, or we can define them using our programming language. A primitive data structure is pre-defined. Some primitive data structures include integers, characters (char), and Boolean structures. Examples of abstract or user-defined data structures include arrays and two-dimensional arrays, stacks, trees and binary trees, linked lists, queues, and more.
User-defined data structures have different characteristics. For example, they can be linear or non-linear, homogeneous or non-homogeneous, and static or dynamic. If we need to arrange data in a linear sequence, we can use an array, which is a linear data structure. If our data is not linear, we can use non-linear data structures, such as graphs. When we have data that is of a similar type, we use homogeneous data structures.
Keep in mind that an array, for example, is both a linear and homogeneous data structure. Non-homogeneous or heterogeneous data structures have dissimilar data. An example of a non-homogeneous data structure a user can create is a class. The difference between a static and a dynamic data structure is that the size of a static structure is fixed, while a dynamic structure is flexible in size.
5. Information theory
Information theory is defined as a mathematical study that allows for the coding of information so that it can be transmitted through computer circuits or telecommunications channels. The information is transmitted through sequences that may contain symbols, impulses, and even radio signals.
In information theory, computer scientists study the quantification of information, data storage, and information communication. Information can be either analog or digital in information theory. Analog data refers to information represented by an analog signal. In turn, an analog signal is a continuous wave that changes over a given time period. A digital signal displays data as binary, that is, as a discrete wave. We represent analog waves as sine waves and digital waves as square waves. The following graph shows the sine curve as a function of value over time:
An analog signal is described by the key elements of a sine wave: amplitude, period, frequency, and phase shift:
- The amplitude is the height of the curve from its center. A sine curve repeats infinitely.
- The period refers to the length of one cycle of the sine curve, that is, the length of the curve before it starts to repeat.
- The frequency and the period of the sine curve have an inverse relationship:
In relation to the inverse relationship, we can also say:
\[period = \frac{1}{frequency}\]- The phase shift of a sine curve is how much the curve shifts from 0. This is shown in the following graph:
In contrast, digital signal graphs look like bar graphs or histograms. They only have two data points, 0 or 1, so they look like boxy hills and valleys:
Digital signals have finite sets of discrete data. A dataset is discrete in that it contains individual and distinct data points. For analog signals, the data is continuous and infinite. When working with computer science, both types of signals are important and useful.
6. Automata theory
Automata theory is one of the most fascinating topics in theoretical computer science. It refers to the study of machines and how calculations can be completed in the most reliable and efficient way. Automata theory involves the physical aspects of simple machines as well as logical processing. So, what exactly is automata used for and how does it work?
Automata are devices that use predetermined conditions to respond to outside input. When you look at your thermostat, you’re working with an automata. You set the temperature you want and the thermostat reacts to an outside source to gather information and adjust the temperatures accordingly.
Another example of automata are surgical robots. These robots can improve the outcomes of surgeries for patients and are being improved upon constantly. Since the goal of automata theory is to make machines that are reliable and efficient, it is a critical piece in the development of artificial intelligence and smart robotic machines such as surgical robots
7. Formal language theory
Formal language theory is often tied to automata theory in computer science. Formal language is the study of the syntax, grammar, vocabulary, and everything involving a formal language. In computer science, formal language refers to the logical processing and syntax of computer programming languages. With regard to automata, the machines process the formal language to perform the tasks or code provided for it.
8. Symbolic computation
Symbolic computation is a branch of computational mathematics that deals with computer algebra. The terms symbolic computation and computer algebra are sometimes used interchangeably. Some programming software and languages are focused on the symbolic computations of mathematics formulas. Programs using symbolic computation perform operations such as polynomial factorization, simplifying algebraic functions or expressions, finding the greatest common divisor of polynomials, and more.
9. Computational geometry
Like symbolic computation, computational geometry lives in the branch of computer science that deals with computational mathematics. The algorithms we study in computational geometry are those that can be expressed with geometry. The analysis of the data is done with geometric figures, geometric analysis, data structures that follow geometric patterns, and more. The input and output of problems that require computational geometry are geometric.
When thinking of geometry, we often revert to the figures we mostly associate with that branch of mathematics, such as polygons, triangles, and circles. That said, when we look at computational geometry, some of the algorithms are those that can be expressed by points, lines, other geometric figures, or those that follow a geometric pattern. Triangulation falls under this branch of computer science.
Triangulation of data is important for applications such as optical 3D measuring systems. We triangulate GPS signals to locate a phone, for example, which is used in law enforcement.
10. Computational number theory
Number theory is the branch of mathematics that studies integers and their properties. Computational number theory then is the study of algorithms used to solve problems in number theory. Part of the study of number theory is primality testing.
Algorithms created to determine whether input or output is prime have been used for many purposes. One of the most critically important uses and applications of primality testing and number theory is for encryption purposes. As our lives have moved to saving everything electronically, our most personal information, such as banking information, family information, and even social security numbers, live in some code or algorithm. It is important to encrypt such information so others cannot use or access it.
Part 2: Learning about a system’s software
System’s software is used to perform multiple functions and communicate between the operating system (OS) of a computer, peripherals such as a keyboard and mouse, and firmware, which is permanently saved to a device and is needed for its operation, among other functions. These are part of the two main types of software: system software and application software .
System software allows a computer to communicate between the hardware and the applications. Think of a smartphone. The phone is composed in its most basic form of the hardware, which includes the battery, cameras, memory, screen, and all the physical components and peripherals. The OS allows those components to be used by applications.
Take the camera application of a phone. The system’s software lets the application communicate with the phone to use the camera to take a picture, edit it, save it, and share it. A computer’s OS also allows the hardware to communicate with programs. A design program will use the mouse or other peripheral that can be used to draw, create, use a touchscreen if available, and more.
If we do not know our system’s software, we cannot create applications that can communicate effectively with our hardware, creating errors that can range from critical, or rendering a peripheral useless, to minor, where some components may work, say taking a picture, but others may not, such as saving or sharing the picture. The system’s software is created in a way that provides us with the easiest, most efficient way to communicate between the hardware and applications
1. Operating systems
The OS performs multiple tasks. If you recall, error handling is part of an OS that checks for the most common possible errors in order to fix them without creating a larger problem or rendering an application worthless. Error handling is one of the operating system’s most important tasks. In addition, the OS is responsible for the security of your computer or device. If you have a smartphone, you know that many updates to the OS are done in order to fix a security problem or to prevent a security breach. The OS is responsible for only allowing an authorized user to interact with the content that is stored in the device.
In addition to security and error handling, an OS is responsible for allocating memory for files and organizing them. When we save and delete a file or program, the memory that had been used is now free. However, there may be something saved immediately before and immediately after. The OS allocates and reallocates memory in order to maintain the best performance possible by the device. Memory management not only refers to user-saved files, but also to the RAM.
The file management of a device is also run by the OS. The OS will allocate the information as a filesystem, breaking the information into directories that are easily accessed by the user and by the device. The filesystem is responsible for keeping track of where files are, both from the OS and from the user, the settings for access to the device, which are evolving constantly, and how to access the files and understand the status of the files. Access to devices has changed in recent years.
While computers typically use a username and password, many devices can now be accessed through a fingerprint, a numerical or alpha-numerical passcode, facial recognition, images, paths, and more. As any of these topics evolve, the OS evolves as well and needs to be updated or recreated. The operating system is also responsible for allowing communication between the applications and the device.
2. Application software
Application software refers to software applications that perform a particular task. Think of the applications, or apps, that you can access from a mobile device. There are hundreds of types of applications, such as static games that live on the device, games that allow you to play with others remotely, news applications, e-book readers, fitness training apps, alarms, clocks, music, and so much more! Applications always perform some form of task, be it for personal use, business use, or educational use.
Application software has multiple functions. You may find suites for productivity, such as Microsoft (Office) and Google products. When we need to do research on the internet, we use applications called browsers, which allow us to access the information and index the information so that we can access it. These browsers include Google Chrome, Safari, Firefox, Edge, Opera, and others. Browsers are used by both mobile devices and computers. Keep in mind that the purpose of an app is to perform a specific task for the end user.
1
2
3
4
5
6
7
8
9
10
Note:
As an aside, applications have grown exponentially since computers became
household tools and phones started being used for other things rather than
just for calling others. Early computers were used for just that: computing,
or calculating mathematical analyses and tasks. That's one of the reasons it
is so important to have an understanding of the development and history of
computer science. Since we cannot completely predict future uses of computer
science and system software, the more we know about them, the more we will
be able to create and adapt when technological advances happen.
Part 3: Understanding computing
In computer science, computing refers to the activities that computers perform in order to communicate, manage, and process information. Computing is usually divided into four main areas: algorithms, architecture, programming languages, and theory.
1. Architecture
Computer architecture refers to the set of instructions that interact with computer systems. In more basic terms, the architecture includes the instructions that allow software and hardware to interact. Computer architecture has three main subcategories: Instruction Set Architecture (ISA) , Microarchitecture , and System Design .
a. Instruction Set Architecture (ISA)
The ISA is the boundary that exists between the hardware and the software. It is classified in multiple ways, but two common ones are complex instruction set computer (CISC) and reduced instruction set computer (RISC) . These are defined as follows:
- CISC : This is a computer that has explicit instructions for many tasks, such as simple mathematical operations, and loading something from memory. CISC includes everything that is not included in RISC.
- RISC : This is a computer with an architecture that has reduced cycles per instruction (CPI)
CISC tries to complete instructions with fewer steps, while RISC only uses simple instructions. CISC is multi-step, while RISC is single-step, performing one task at a time. The CISC process includes the instructions, the microcode conversion, microinstructions, and execution. By contrast, RISC includes instructions and execution.
In CISC, microcode conversion refers to the interpretation of the language at a lower level. It takes into consideration the hardware resources to create microinstructions. Microinstructions are single instructions in microcode. After microcode creates the microinstructions, the microinstructions can be executed. The following diagram shows the process for both RISC and CISC:
Both RISC and CISC are necessary for computer programmers. There are advantages and disadvantages to having a single-step process (RISC) versus a multi-step process (CISC). RISC reduces the cycles per instruction, doing one thing at a time. CISC reduces the instructions in a program, but at the cost of cycles per instruction. Depending on what our needs are, we can choose the best path to take
2. Programming languages
Programming languages are the way we write instructions for computers and other devices. Different languages are used depending on what is needed, ease of use, and much more:
- Python and Ruby
- C
- C++
- C#
- Swift
- Scratch
- Java and Javascript
- PHP
- SQL
In computational thinking, we use many different programming languages, depending on what our goals are, what information we have or need, and what our application or software requirements are. Choosing a language is dependent on not just our knowledge of the language, but the possible functionalities of the language.
Part 4: Learning about data types and structures
In computer science, data types and structures are two distinct things:
A data type is a basic classification. Some data types include integers, float, and strings.
Data structures use multiple types of data types. They can organize the information into the memory and determine how we access the information.
Let’s look at these in more detail in the following sections.
1. Data types
As mentioned, data types are basic classifications. They are variables that are used throughout a program and can only exist with one classification. There are different classes of data type. We will focus on primitive and abstract data types for now, but we will revisit this topic as we move through problems and design solutions.
Primitive data types include byte, short, int, long, float, double, Boolean, and char :
- A byte can store numbers from -128 to 127. While these numbers can be stored as integers, or int, a byte uses less storage, so if we know the number is between those values, we can use a byte data type instead.
- A short is a number between -32,768 and 32,767.
- An integer, int, is used to store numbers between -2,147,483,648 and 2,147,483,647.
- Long is used to store numbers from -9,223,372,036,854,775,808 and 9,223,372,036,854,775,807.
- A float allows us to save a decimal number.
- Decimal numbers can also be saved as double, which has more precision than a float.
- Boolean values are data types that are either True or False. So, a variable can be saved such that when its value is printed, the result will be saved as true or false.
- Char is used to save a variable as a single character
2. Data structures
As mentioned under the Coding theory section earlier in this chapter, data structures are used to collect and organize data in the most efficient and effective way possible. Data structures can be primitive, such as the built-in data structures in software, or abstract. Primitive data structures can also be defined using programming languages, but they are pre-defined. Some of the primitive data structures include the data types listed in the previous section, such as chars and Boolean structures.
Abstract data types (ADTs) include the information for the structure and design of data types. Abstract data structures include arrays and two-dimensional arrays, stacks, trees and binary trees, linked lists, queues, and more, as mentioned in the Coding theory section earlier in this chapter. Lists can contain multiple instances of the same data values. These lists are countable, so we can find how many elements are in the list, reorder them, remove items, add items, and so on. Lists are widely used as linked lists, arrays, or dynamic arrays:
- A linked list means that each data element in the list is connected, or points, to the next one, regardless of where they are stored within the memory.
- An array is ordered. The elements are read in order to be able to make sense. Think of an array as reading this sentence. You don’t read the sentence as “array an think reading as this of sentence.” We read the sentence in order, from left to right, not in a jumbled order.
- Dynamic arrays can be resized, which is important when choosing a data type.
A stack ADT is a collection of elements and has two operations – push and pop. A push is used to add an element to the collection while a pop removes the most recent element.
A queue ADT is a linear data structure. As with a stack, we can add or remove elements. However, in a queue ADT, the point of deletion and the point of insertion are done at two different ends. As mentioned before, the data structures are concrete implementations of data types. How we add or remove elements from a collection, for example, is the data structure.