Data Structure
Data Structure
Data Structure give us organization, storage, and access.
Data Is an information that is stored or processed by a computer.
Data Type An attribute of data that describes the values it can have and how the data can be used.
Types of Data
Common Types of Data
- Numbers
- Letters
- True and False
Numbers
- Whole Numbers (short, int and long).
- Decimal Numbers(double and float).
- Signed (Positive and negative values).
- Unsigned (Only positive values).
Boolean A Boolean is a true or false value. A Boolean is one bit.
Primitive Types
variableName -> data
- int
- doubles
- longs
- floats
- shorts
- booleans
- chars
Referenced Types
variableName -> memoryAddress -> value
- String
- Data Structures
Pointers
Example:
Collection A Collection B
3 -> 00000011 (5) -> 00000010
1 -> 00000001 4 -> 00000100
(5) -> 00000010
The fives in each collection can have the same memory allocation.
Each programming language determines what access you have to memory management tools. C++ (manage pointers, memory, and data) Java and Python (memory management handled for you)
Arrays
Collection of elements, where each item is identified by an index or key. Try to access an index that doesn’t exist will give a error with the phase “index out of bounds”.
Arrays Provide: Organization, storage and access.
Jagged Arrays A jagged array can have elements of different dimensions and sizes.
Example:
| | | | | |
|---|---|---|---|---|
| 1 | 3 | 8 |
| 1 | 2 |
| 9 | 0 |
| 10| 11 | 4 | 20 | 50
| 30|
This jagged array have 5 fixed rows and variable number of columns.
Resizable Arrays Java and C++ basic arrays cannot be resized. Ruby and JavaScript basic arrays can be resized.
- Immutable: basic array data that cannot change. Example: Java basic arrays.
- Mutable: Java classes give us resizable versions like arrayList.
ArrayList
- An arrayList is an array under the surface
- Focus less on maintaining data structures and more on creating
Example: myArrayList.add(2, 10); -> inserts the value 10 at index 2
graph LR
A[add, push] -->B[adding to the back of the array]
C[add, push] -->D[adding to the back of the array]
style A fill:#eee,stroke:#594,stroke-width:4px
style B fill:#eee,stroke:#555,stroke-width:2px
style C fill:#eee,stroke:#29f,stroke-width:4px
style D fill:#eee,stroke:#555,stroke-width:2px
Insert functionality in Non-mutable Arrays, can be done in two ways.
- If basic array is big enough: Items are shuffled down and a new item is added.
- If basic array is not big enough: All contents copied into a new, bigger basic
array, and new items are also copied over with it.
This can have major performance improvements.
Search Arrays
How can we search?
- Check every item
- If matched, return true
- If no match (after searching the entire structure), return false
Searching Outcomes
- Best case: it’s the first item
- Worst case: not in the array
Array Search Reminders
- Linear search occurs behind the scenes
- indexOf has no information about your data
- Check every element
You can also sort items to make search even faster.
Sort Arrays
Sorting with Programming Language
- Call sorting function to your collection of objects
- Pass your data structure as a parameter to a sorting function
Big O Notation
Notation used to describe the performance or complexity of an algorithm.
Operations
- Access
- Updated
- Insert
- Search
- Delete
- Sort
O(1) Time
- Consistent duration of algorithm execution in same time
(or space) regardless of the size of the input. - Also called “constant time”
Outcomes: Insertion
flowchart TD
A[Best-Case\nScenario]---B(Large enough array)-->C["O(1) time\n(constant time)"]
D[Worst-Case\nScenario]---E(array is full)-->F["O(n) time\n(linear time)"]
style A fill:#1ad,stroke:#555,stroke-width:2px
style D fill:#1ad,stroke:#555,stroke-width:2px
style B fill:#bbb,stroke:#555,stroke-width:2px
style E fill:#bbb,stroke:#555,stroke-width:2px
style C fill:#0c9,stroke:#555,stroke-width:2px
style F fill:#fc0,stroke:#555,stroke-width:2px
Outcomes: Search
flowchart TD
A[Best-Case\nScenario]-->B(compare to Item)-->C["O(1) time\n(constant time)"]
D[Worst-Case\nScenario]-->E(Item does not exists)-->F["O(n) time\n(linear time)"]
style A fill:#1ad,stroke:#555,stroke-width:2px
style D fill:#1ad,stroke:#555,stroke-width:2px
style B fill:#bbb,stroke:#555,stroke-width:2px
style E fill:#bbb,stroke:#555,stroke-width:2px
style C fill:#0c9,stroke:#555,stroke-width:2px
style F fill:#fc0,stroke:#555,stroke-width:2px
Outcomes: Deletion
flowchart TD
A[Best-Case\nScenario]-->B(Location and delete item)-->C["O(1) time\n(constant time)"]
D[Worst-Case\nScenario]-->E(Search, then locate,then delete item)-->F["O(n) time\n(linear time)"]
style A fill:#1ad,stroke:#555,stroke-width:2px
style D fill:#1ad,stroke:#555,stroke-width:2px
style B fill:#bbb,stroke:#555,stroke-width:2px
style E fill:#bbb,stroke:#555,stroke-width:2px
style C fill:#0c9,stroke:#555,stroke-width:2px
style F fill:#fc0,stroke:#555,stroke-width:2px
Linked List
Node contains data and a pointer.
Singly List only have pointers point to next note in the list.
Double List have next and previews pointer, can go forward and backward.
ArrayList
- Has behavior of a list on the surface
- Stored as an array under the hood
C# and C++ Have a double linkedList.
No built-in linked lists for Swift, Ruby, and JavaScript. But there are third party equivalents.
Python lists are not really lists, they are resizable arrays. No built-in linked list implementation.
Stacks and Queues
Stacks follow a Last in, First Out (LIFO). Great for programs where
you need to reverse things. Keeping track of state.
- Push add an item to stack.
- Pop remove last item from stack.
- Peek check the last item of stack without removing it.
Queue follow a First in, First Out (FIFO). Good for treads.
- Enqueue is when an item is added to a list.
- Dequeue is when an item is removed from the list.
- Peek see the first item in the queue without removing it.
Priority Queue each element has a priority associated with it.
a | t | w | r |
---|---|---|---|
high priority | low priority | medium priority | high priority |
- a dequeued first
- Then r dequeued
- Then w dequeued
- Then t dequeued
D-E-Q-U-E-K it’s a double-ended queue have a stack and queue at the same time. Items can
be added or removed from either end Items can be added or removed from either end.
Hash
Associative Array is Collection of key-value pairs.
Key –> Value
Key:Value
State –> Capital
California:Sacramento
Associated Array Rules
- Key-value pairs are bound together
- Each key must be unique
- Order isn’t important
- Values are accessed with the key
- Values do not need to be unique
Hashing is a way of take some raw data and mixing together to
make a small single data. It’s a data conversion process.
Hash Inputs
- Characters
- Objects
- Numbers
Hash output
- A integer
- They are not reversible
Collision occur anytime two inputs produce the same hash value.
Hash table is an implementation of the associative array
abstract data structure.
Trees and Graphs
Set
- A collection of unique items
- Order doesn’t matter
- None of the elements are duplicated
How Set Works
- Take an object.
- Hash the object.
- Store the object using the index.
- Repeat and check if object is stored.
Tree Data Structure child nodes with the same parent are siblings.
flowchart TD
1((1 Root\nNode))-->2((2))
1((1 Root\nNode))-->3((3))
2((2 Parent\nNode))-->4((4))
2((2 Parent\nNode))-->5((5))
3((3))-->6((6))
3((3))-->7((7))
4((4 Child\nNode))-->8((8))
4((4 Child\nNode))-->9((9))
5((5))-->10((10))
5((5))-->11((11))
6((6))-->12((12))
6((6))-->13((13))
7((7))-->14((14))
7((7))-->15((15))
Constrain: max two child nodes from any given node.
What Each Note Can Have
- Many children
- Just one child
- No children
Binary Search Tree have an additional constraint, order.
- Left child node < Parent node
- Right child node > Parent node
flowchart TD
1((30))-->2(( ))
1((30))-->3(( ))
2((20))-->4(( ))
2((20))-->5(( ))
3((40))-->6(( ))
3((40))-->7(( ))
4(( ))-->8(( ))
4(( ))-->9(( ))
5((25))-->10(( ))
5((25))-->11(( ))
6((35))-->12(( ))
6((35))-->13(( ))
7((50))-->14(( ))
7((50))-->15((60))
15((60))-->16(( ))
15((60))-->17((70))
Heap is a data structure implemented as a binary tree.
Priority Queue
- Order doesn’t matter
- Heaps are used to implement
Binary Search Tree: maintain sorted order while staying fast for insertion, deletion, and accessing. Heaps are types of binary tree there are great for priority queues.
Conclusion
Every data structure have their pros and cons and there is no
perfect data structure.
Depends on:
- What do you want to store?
- How do you want to store it?
- How do you want to access it?