Advent of Code 2022 in Oracle - Day 7

My solution for the seventh day of the Advent of Code 2022 challenge "No Space Left On Device" with Oracle. Today with a tree (not for christmas)...

Advent of Code

Advent of Code is a cool yearly programming challenge. Each day from 1st to 25th December a new challenge appears. You can solve it in any programming language you like. After deepening my Go skills last year I will do this in Oracle (so PL/SQL and SQL) this year.

I try to blog about my solutions but I won’t make it every day or give up after a while as they get more difficult and time-consuming. You can find my solutions for this year in this repository.

I really suggest you check Advent of Code out yourself as it is a lot of fun and you will improve your problem-solving skills.

The challenges are clearly phrased with a simple example you can do in your head. You then get a way bigger generated and user-specific input and have to solve the challenge with that. You have to enter a solution in an input field and get a star for each day you solved. There is also a second challenge every day where you have to adapt some part of your algorithm to solve it.

Challenge 1

You can find the challenge here.

Today’s challenge is really interesting. The elves gave us a (UNIX-like) device that ran out of storage. From some shell commands, we have to build an index that keeps track of folder and file sizes.

The input has some commands starting with a $ the other lines are output. The history basically traverses through a file system and lists directory contents:

1$ cd /
2$ ls
3dir a
414848514 b.txt
58504156 c.dat
6dir d
7$ cd a
8$ ls
9dir e
1029116 f
112557 g
1262596 h.lst
13

In the end, we need to provide the total size of all directories that are smaller than 100000 (elf-bytes?).

Solving challenge 1

Creating a table to store the file system info

I first went ahead and created a table to store all the info that could be extracted from the input.

1create table aoc_filesystem (
2  fs_path       varchar2(100 char) primary key
3, fs_type       varchar2(1 char) not null -- D for directory, F for file
4, fs_size       number(32,0) -- null for directories
5, fs_level      number(32,0)
6, fs_parent_dir varchar2(100 char)
7) inmemory;
8

We can also start by inserting the root directory:

1insert into aoc_filesystem (fs_path, fs_type, fs_size, fs_level, fs_parent_dir)
2values ('/', 'D', null, 0, null);
3

Parsing the directory changes

As we need to figure out directory sizes we first need to get the absolute file path for each file to be able to sum these up. For that, we need to process the directory changes.

I created an apex_t_varchar2 array where I append or remove a directory when we go deeper or higher in the file system:

1-- ...
2for rec in (
3  -- loop through every line
4  select rownum as line_no, column_value as line
5    from table ( select apex_string.split(input)
6                   from aoc_input
7                  where year = 2022 and day = 7 )
8)
9loop
10  -- commands start with a $
11  if rec.line like '$ %' then
12
13    -- ignore ls commands
14    if rec.line = '$ ls' then
15      continue;
16    elsif rec.line like '$ cd %' then
17      -- get cd destination
18      l_dest := substr(rec.line, 6);
19
20      if l_dest = '/' then
21        -- clear current path array
22        l_curr_path := apex_t_varchar2();
23      elsif l_dest = '..' then
24        -- delete last element from current path array (go up one level)
25        l_curr_path.trim(1);
26      else
27        -- add new path element to current path array (go down one level)
28        apex_string.push(l_curr_path, l_dest);
29      end if;
30
31       dbms_output.put_line('cd ' || l_dest || ' | ' || get_curr_path());
32    else
33      dbms_output.put_line('Unknown command: ');
34      dbms_output.put_line(rec.line_no || ' ' || rec.line);
35    end if;
36  -- ...
37

Parse size outputs

1-- ...
2-- inside same loop through every input line
3else
4
5  -- directory info starts with "dir"
6  if rec.line like 'dir%' then
7    -- get dirname
8    l_dest := substr(rec.line, 5);
9    l_path := join_to_curr_path(l_dest); -- get absolute path
10    l_level := l_curr_path.count + 1;
11    l_parent := get_curr_path();
12
13    insert into aoc_filesystem (fs_path, fs_type, fs_size, fs_level, fs_parent_dir)
14    values (l_path, 'D', null, l_level, l_parent);
15
16  -- file info starts with a number
17  else
18    -- file info looks like this: "14848514 b.txt"
19    l_size := regexp_substr(rec.line, '([0-9]+) .*', 1, 1, null, 1 );
20    l_dest := regexp_substr(rec.line, '[0-9]+ (.*)', 1, 1, null, 1 );
21    l_path := join_to_curr_path(l_dest); -- get absolute path
22    l_level := l_curr_path.count + 1;
23    l_parent := get_curr_path();
24
25    insert into aoc_filesystem (fs_path, fs_type, fs_size, fs_level, fs_parent_dir)
26    values (l_path, 'F', l_size, l_level, l_parent);
27  end if;
28
29end if;
30

After inserting all the info our table looks like this:

1select *
2  from aoc_filesystem
3 order by fs_path
4;
5/*
6FS_PATH     FS_TYPE        FS_SIZE    FS_LEVEL FS_PARENT_DIR
7___________ __________ ___________ ___________ ________________
8/           D                                0
9/a          D                                1 /
10/a/e        D                                2 /a
11/a/e/i      F                  584           3 /a/e
12/a/f        F                29116           2 /a
13/a/g        F                 2557           2 /a
14/a/h.lst    F                62596           2 /a
15/b.txt      F             14848514           1 /
16/c.dat      F              8504156           1 /
17/d          D                                1 /
18/d/d.ext    F              5626152           2 /d
19/d/d.log    F              8033020           2 /d
20/d/j        F              4060174           2 /d
21/d/k        F              7214296           2 /d
22*/
23

Calculate directory sizes

Looking at the table data hierarchical queries immediately come to mind. After some googling and trial and error I came up with the following query that sums up the dir sizes.

What it does is produce multiple rows for each file. Each file is listed in a directory it is included in, not only the direct parent. Look how /d/d.log is both in /d and /:

1select fs_size
2     , fs_path
3     , connect_by_root fs_path as contained_in
4  from aoc_filesystem
5connect by prior fs_path = fs_parent_dir
6order by contained_in;
7;
8
9/* Rusult:
10
11    FS_SIZE FS_PATH     CONTAINED_IN
12___________ ___________ _______________
13            /           /
14            /a          /
15            /a/e        /
16        584 /a/e/i      /
17      29116 /a/f        /
18    7214296 /d/k        /
19       2557 /a/g        /
20    8033020 /d/d.log    /
21    ............................
22    8033020 /d/d.log    /d
23    ............................
24    8033020 /d/d.log    /d/d.log
25    4060174 /d/j        /d/j
26    7214296 /d/k        /d/k
27*/
28

What we need to do now is to sum the sizes up and group them by directories. I also joined the result again with the source table to filter out the files and only get the directories:

1with contained_data as (
2  select fs_size
3       , fs_path
4       , connect_by_root fs_path as contained_in
5    from aoc_filesystem
6  connect by prior fs_path = fs_parent_dir
7), size_sums as (
8select sum(fs_size) dir_size_sum, contained_in as dir_path
9  from contained_data
10 group by contained_in
11)
12select dir_path, dir_size_sum
13  from size_sums
14  join aoc_filesystem
15    on dir_path = fs_path
16   and fs_type = 'D'
17;
18
19/* Result:
20DIR_PATH       DIR_SIZE_SUM
21___________ _______________
22/                  48381165
23/a                    94853
24/a/e                    584
25/d                 24933642
26*/
27

With that in place, we only need to sum up the size of the dirs that are smaller than 100000. You can check out the full solution here.

Challenge 2

In the second challenge, we need to calculate what (single) folder to delete if we want to have 30M elf-bytes of free space. We know that the drive has a capacity of 70M elf-bytes. We also want to delete the smallest folder possible. The input needed is the exact size of the folder that gets deleted.

Solving challenge 2

Because we have a pretty good setup from part 1 we don’t need to change that much. First, we need to know how much space we need to delete to get to 30M elf-bytes:

1def size_needed
2col needed_size for a30 new_value size_needed
3
4       --- 70000000 -> drive capacity
5       --- 30000000 -> what we want free
6       --- dir_size_sum -> used up space (just size of /)
7select 30000000 - (70000000 - dir_size_sum) as needed_size
8  from only_folders -- our query from part 1 that lists all folder sizes
9 where dir_path = '/'
10;
11

We can then use that value to find the folder that has the size closest to that

1select dir_path, dir_size_sum
2  from only_folders
3 where dir_size_sum >= &size_needed
4 order by dir_size_sum
5 fetch first 1 rows only
6;
7

You can check out the full solution here.