- Characteristics
- Performance
- Supported Element Types
- Declaration
- Literal Syntax
- Access and Assign
- Iteration
- Sorting
- No Duplicates
- Further Information
Sets are dynamically resized unordered hash tables (key only) with bounds checking.
Characteristics
- Unordered
- No duplicates (each element value is only stored once)
Performance
- No indexing
- Fast append (constant time, O(1), except if resize is necessary)
- Fast lookup for contains test (constant time, O(1))
- No insert (because there is no order, append is the only way to add elements)
- Fast remove
- Large memory footprint (compared to sequence)
All element access depends on the performance of the hash function. For objects, the default hashing is based on pointers, which is very fast (but has the drawback that the set has to be rehashed if the garbage collector moves any of the referred elements). Hashing on strings is linear to the length of the string. Hashing on large values, such as point, can also be noticeable.
Supported Element Types
Supported element types for sets include:
- int8, int16, int, int64
- byte, word, nat
- double, float
- char, cchar
- str, str8, cstr, symbol
- Object
- Class, Type, Enum
- point (cm.geometry)
- pointF (cm.geometry)
- pointI (cm.geometry2D)
- angleF (cm.geometry2D)
- time, timespan, date, datetime
- rangeI, rangeF
Examples of types that cannot be used:
- range
- point2D (cm.geometry2D)
Declaration
elementType{} identifier;
Example
Declare a set of integers initialized to null:
{
int{} integers;
}
Declare and create an empty sequence (recommended):
{ int{} integers(); }
Same as above, using the new keyword:
{ int{} integers = new int{}; }
It's also possible to create the set using the init feature:
{ int{} integers; ... init integers(); }
You can also specify an initial capacity (measured in the number of elements) for the collection:
{ int{} integers(64); integers = new int{}(64); init integers(64); }
Even though the set will grow incrementally (by doubling its size) as necessary when you append elements, increasing the initial size can improve performance if you know that you will append a large number of elements. Generally speaking, however, you should be conservative with the allocation size.
Literal Syntax
{type: expr1, expr2 .. exprn}
{expr1, expr2 .. exprn}
Example
Create a set containing the elements 1, 2, and 3 as long integers:
{ int64{} values = {int64: 1, 2, 3}; }
Same as above but omits int: since the element type can be inferred from the expressions:
{
int64{} values = {1i64, 2i64, 3i64};
}
Access and Assign
There are a few different ways to access and assign elements in a set.
Access Example
{ int{} set = {int: 1, 2, 3, 2, 3, 4}; pln(#set); pln(#set.count); pln(#set.any); pln(#set.empty); pln(#set.contains(3)); pln(#(3 in set)); pln(#set.contains(42)); pln(#(42 in set)); }
Output: set={int: 1, 2, 3, 4} set.count=4 set.any=true set.empty=false set.contains(3)=true (3 in set)=true set.contains(42)=false (42 in set)=false
Assign Example
{ int{} set = {int: 1, 2, 3, 2, 3, 4}; pln(#set); set += {int: 31, 32}; pln(#set); set << 31 << 32 << 33; pln(#set); set.remove(3); pln(#set); }
Output: set={int: 1, 2, 3, 4} set={int: 32, 1, 2, 3, 4, 31} set={int: 32, 1, 2, 3, 4, 33, 31} set={int: 32, 1, 2, 4, 33, 31}
Iteration
Built-in for iteration is supported, but index operations (such as start=n, end=n, or reverse) are not suitable because the set has no deterministic order. You can, however, use the index keyword for-loop index. Also, note that the order of the elements in the iteration is to be considered random.
Example
{ int{} set = {1, 2, 3, 4, 31, 32}; for (z in set, index=i) pln(i; z); }
Output: 0, 32
1, 1
2, 2
3, 3
4, 4
5, 31
CM also supports modifying the set during iteration. Upon modify, a soft-copy of the set will be made and used for iterating the set. Thus you will always be iterating over the original set of elements. Adding or removing elements during an iteration is therefore safe.
Since a soft-copy is made, it may be ill-advised to modify an exceptionally large set during iteration.
Sorting
Sorting a set is not possible.
No Duplicates
It is important to note that sets will only ever store one instance of an object at a time. The exact behavior of this feature can be controlled by specifying an eq-function that defines if two elements are the same or not. For a set of objects, the default eq-function is by pointer comparison, which is very fast.
This property makes the set ideal to use as a "visited set" for example.
Example
private int myHash(int a, Object env) { return hash(a/10); } private bool myEq(int a, int b, Object env) { return |a - b| < 10; } { int{} values(); Object myHashEnv = null; values.setHash(function myHash, function myEq, myHashEnv); values << 10 << 20 << 21 << 22 << 23; pln(values); }
Output: {int: 10, 20}
Note: The hash and eq functions must match in such a way that any two elements that are considered equal must also return the same hash value, otherwise the set will not guarantee that it is void of duplicates.
Further Information
See file your workspace/base/cm/lang/set.cmz
Comments
0 comments
Please sign in to leave a comment.