Advent of Code 2022 in Oracle - Day 7

My solution for the seventh day of the Advent of Code 2022 challenge "No Space Left On Device" with Oracle. Today with a tree (not for christmas)...

Advent of Code

Advent of Code is a cool yearly programming challenge. Each day from 1st to 25th December a new challenge appears. You can solve it in any programming language you like. After deepening my Go skills last year I will do this in Oracle (so PL/SQL and SQL) this year.

I try to blog about my solutions but I won’t make it every day or give up after a while as they get more difficult and time-consuming. You can find my solutions for this year in this repository.

I really suggest you check Advent of Code out yourself as it is a lot of fun and you will improve your problem-solving skills.

The challenges are clearly phrased with a simple example you can do in your head. You then get a way bigger generated and user-specific input and have to solve the challenge with that. You have to enter a solution in an input field and get a star for each day you solved. There is also a second challenge every day where you have to adapt some part of your algorithm to solve it.

Challenge 1

You can find the challenge here.

Today’s challenge is really interesting. The elves gave us a (UNIX-like) device that ran out of storage. From some shell commands, we have to build an index that keeps track of folder and file sizes.

The input has some commands starting with a $ the other lines are output. The history basically traverses through a file system and lists directory contents:

$ cd /
$ ls
dir a
14848514 b.txt
8504156 c.dat
dir d
$ cd a
$ ls
dir e
29116 f
2557 g
62596 h.lst

In the end, we need to provide the total size of all directories that are smaller than 100000 (elf-bytes?).

Solving challenge 1

Creating a table to store the file system info

I first went ahead and created a table to store all the info that could be extracted from the input.

create table aoc_filesystem (
  fs_path       varchar2(100 char) primary key
, fs_type       varchar2(1 char) not null -- D for directory, F for file
, fs_size       number(32,0) -- null for directories
, fs_level      number(32,0)
, fs_parent_dir varchar2(100 char)
) inmemory;

We can also start by inserting the root directory:

insert into aoc_filesystem (fs_path, fs_type, fs_size, fs_level, fs_parent_dir)
values ('/', 'D', null, 0, null);

Parsing the directory changes

As we need to figure out directory sizes we first need to get the absolute file path for each file to be able to sum these up. For that, we need to process the directory changes.

I created an apex_t_varchar2 array where I append or remove a directory when we go deeper or higher in the file system:

-- ...
for rec in (
  -- loop through every line
  select rownum as line_no, column_value as line
    from table ( select apex_string.split(input)
                   from aoc_input
                  where year = 2022 and day = 7 )
)
loop
  -- commands start with a $
  if rec.line like '$ %' then

    -- ignore ls commands
    if rec.line = '$ ls' then
      continue;
    elsif rec.line like '$ cd %' then
      -- get cd destination
      l_dest := substr(rec.line, 6);

      if l_dest = '/' then
        -- clear current path array
        l_curr_path := apex_t_varchar2();
      elsif l_dest = '..' then
        -- delete last element from current path array (go up one level)
        l_curr_path.trim(1);
      else
        -- add new path element to current path array (go down one level)
        apex_string.push(l_curr_path, l_dest);
      end if;

       dbms_output.put_line('cd ' || l_dest || ' | ' || get_curr_path());
    else
      dbms_output.put_line('Unknown command: ');
      dbms_output.put_line(rec.line_no || ' ' || rec.line);
    end if;
  -- ...

Parse size outputs

-- ...
-- inside same loop through every input line
else

  -- directory info starts with "dir"
  if rec.line like 'dir%' then
    -- get dirname
    l_dest := substr(rec.line, 5);
    l_path := join_to_curr_path(l_dest); -- get absolute path
    l_level := l_curr_path.count + 1;
    l_parent := get_curr_path();

    insert into aoc_filesystem (fs_path, fs_type, fs_size, fs_level, fs_parent_dir)
    values (l_path, 'D', null, l_level, l_parent);

  -- file info starts with a number
  else
    -- file info looks like this: "14848514 b.txt"
    l_size := regexp_substr(rec.line, '([0-9]+) .*', 1, 1, null, 1 );
    l_dest := regexp_substr(rec.line, '[0-9]+ (.*)', 1, 1, null, 1 );
    l_path := join_to_curr_path(l_dest); -- get absolute path
    l_level := l_curr_path.count + 1;
    l_parent := get_curr_path();

    insert into aoc_filesystem (fs_path, fs_type, fs_size, fs_level, fs_parent_dir)
    values (l_path, 'F', l_size, l_level, l_parent);
  end if;

end if;

After inserting all the info our table looks like this:

select *
  from aoc_filesystem
 order by fs_path
;
/*
FS_PATH     FS_TYPE        FS_SIZE    FS_LEVEL FS_PARENT_DIR
___________ __________ ___________ ___________ ________________
/           D                                0
/a          D                                1 /
/a/e        D                                2 /a
/a/e/i      F                  584           3 /a/e
/a/f        F                29116           2 /a
/a/g        F                 2557           2 /a
/a/h.lst    F                62596           2 /a
/b.txt      F             14848514           1 /
/c.dat      F              8504156           1 /
/d          D                                1 /
/d/d.ext    F              5626152           2 /d
/d/d.log    F              8033020           2 /d
/d/j        F              4060174           2 /d
/d/k        F              7214296           2 /d
*/

Calculate directory sizes

Looking at the table data hierarchical queries immediately come to mind. After some googling and trial and error I came up with the following query that sums up the dir sizes.

What it does is produce multiple rows for each file. Each file is listed in a directory it is included in, not only the direct parent. Look how /d/d.log is both in /d and /:

select fs_size
     , fs_path
     , connect_by_root fs_path as contained_in
  from aoc_filesystem
connect by prior fs_path = fs_parent_dir
order by contained_in;
;

/* Rusult:

    FS_SIZE FS_PATH     CONTAINED_IN
___________ ___________ _______________
            /           /
            /a          /
            /a/e        /
        584 /a/e/i      /
      29116 /a/f        /
    7214296 /d/k        /
       2557 /a/g        /
    8033020 /d/d.log    /
    ............................
    8033020 /d/d.log    /d
    ............................
    8033020 /d/d.log    /d/d.log
    4060174 /d/j        /d/j
    7214296 /d/k        /d/k
*/

What we need to do now is to sum the sizes up and group them by directories. I also joined the result again with the source table to filter out the files and only get the directories:

with contained_data as (
  select fs_size
       , fs_path
       , connect_by_root fs_path as contained_in
    from aoc_filesystem
  connect by prior fs_path = fs_parent_dir
), size_sums as (
select sum(fs_size) dir_size_sum, contained_in as dir_path
  from contained_data
 group by contained_in
)
select dir_path, dir_size_sum
  from size_sums
  join aoc_filesystem
    on dir_path = fs_path
   and fs_type = 'D'
;

/* Result:
DIR_PATH       DIR_SIZE_SUM
___________ _______________
/                  48381165
/a                    94853
/a/e                    584
/d                 24933642
*/

With that in place, we only need to sum up the size of the dirs that are smaller than 100000. You can check out the full solution here.

Challenge 2

In the second challenge, we need to calculate what (single) folder to delete if we want to have 30M elf-bytes of free space. We know that the drive has a capacity of 70M elf-bytes. We also want to delete the smallest folder possible. The input needed is the exact size of the folder that gets deleted.

Solving challenge 2

Because we have a pretty good setup from part 1 we don’t need to change that much. First, we need to know how much space we need to delete to get to 30M elf-bytes:

def size_needed
col needed_size for a30 new_value size_needed

       --- 70000000 -> drive capacity
       --- 30000000 -> what we want free
       --- dir_size_sum -> used up space (just size of /)
select 30000000 - (70000000 - dir_size_sum) as needed_size
  from only_folders -- our query from part 1 that lists all folder sizes
 where dir_path = '/'
;

We can then use that value to find the folder that has the size closest to that

select dir_path, dir_size_sum
  from only_folders
 where dir_size_sum >= &size_needed
 order by dir_size_sum
 fetch first 1 rows only
;

You can check out the full solution here.

Other Posts

Comments

Loading comments...
Homepage All Blogposts

AI disclaimer: I spend hours writing my blog posts by hand, adding my own thoughts and experiences to them. In my view, purely AI-generated content lacks that human depth and isn't worth publishing. I only use AI for research and editing assistance.