Installing PostgreSQL on Alpine RPi

Now that I have successfully installed Alpine on the Raspberry Pi I will be using as my database server, it’s time to install PostgreSQL.

When doing something new and unfamiliar, it’s good practice to search and see if any guides exist. Since the work I’m doing doesn’t involve edge cases or exotic technologies, I usually find a guide that’s fairly close to what I’m doing.

On and off the beaten path

That said, every field has its “ruts” and industry standard technologies, and the further you get from that beaten path the more manual work, intervention, and research you will need to do. To use my setup as an example:

Alpine is a less common Linux distribution
- Most documentation is written for more common server distributions like CentOS (Red Hat/IBM), Ubuntu (Canonical), and SUSE
- Less common software packages might be outdated or not available in apk, the package manager for Alpine
- Even if I choose to compile software from its source code manually, Alpine uses a non-standard compiler (musl instead of glibc) whose operation and quirks I would need to learn
Raspberry Pi is nonstandard hardware for many server applications
- It’s based on ARM architecture, which is still far behind X86 in terms of adoption and technical support for server products
- It runs a closed source version of the ThreadX operating system, meaning Linux is actually a secondary operating system
- Access to many low level hardware systems, notably temperature sensors, is not available directly to Linux. Even though the Raspberry Pi tools are supported on Alpine, learning how to get them working is another hoop

The level of technical difficulty increases the further one strays from the norm. If somebody theoretically wanted to use Babylonian numbers derived from Plimpton 322 for their trigonometric work, chances are good that they will have to put a significant amount of work into adjusting their workflow to support their use of the ancient digits.

Fortunately, my rig is not quite that uncommon. Another source of difficulty for a student like me comes from the combination of computer science being an intensely individual effort and the underappreciated challenges of emotional management in tech.

Google searches and profanity

Modern computing is obscenely complicated, and problems are constant: most things worth doing will regularly require venturing down rabbit holes into the minutiae in order to find solutions. This can be draining, frustrating, and sap motivation.

My brother once described Linux as being “powered by Google searches and profanity” and he’s right that the steep learning curve really elevates the importance of pacing and frustration management. Sometimes the pace of study is slowed down because something that “shouldn’t” be broken suddenly cuts off access to the main workflow.

I was recently very frustrated getting pgAdmin4 (a graphical interface program for postgres) to remotely connect to my server and went to sleep late upset and exhausted for several nights, trying to figure out why my connection was not working. Ultimately I found that the setting PermitTunnel = no in the file /etc/ssh/sshd_config was causing the Pi’s ssh server to block my laptop’s connections.

It might sound silly to grump and fuss about this, but lots of people give up on STEM careers and I expect that contributing factors include a lack of emotional management ability (which is in turn influenced by how and whether we talk about managing emotions) and venues where people can have experiences of community that help them to regulate themselves.

Finding out an optimal way of searching for solutions to endless tech problems is critical: it is draining to stay on one approach for too long and systematically exploit every possible solution that exists, but by jumping around too much there’s a risk of missing a solution or not understanding a dynamic that will come up again.

Sometimes disengaging and doing anything else and then coming back later, perhaps tomorrow, is the best approach.

I’ve found that with more experience I’ve developed a feeling for how much time I should spend going deep, how much time I should spend going wide, and when and how to try out different perspectives that change my approach to the problem.

Nothing in excess

Another challenge is tempering enjoyment. Even more often than they are frustrating, niche topics are a challenge for me because they are exhilirating: I love learning about technology.

I’m well in touch with my inner child, who would rather explore than get rewards. Understanding cryptography algorithms is a little beyond my current ability, but I enjoy reading about network security administration and the finer points of software like ssh.

I regularly go down 47-browser-tab rabbit holes studying the Linux kernel and I’ve made it most of the way through building Linux From Scratch. I am fascinated by assembly code and want to learn it, including how penetration techniques like stack smashing work.

There are important topics, of course. ssh is a crucial technology and will be used for the foreseeable future, cybersecurity is not getting any less important, knowledge of the Linux kernel is useful every single day, and low level optimization, including assembly, is important in fields where every bit of performance matters.

Exhaustion is exhaustion, though. Your body doesn’t know whether sleep deprivation was caused by something that makes you happy. Exhaustion caused by joy is just as real as that caused by physical exertion and frustration, but happy exhaustion is more dangerous because it feels good and society eggs us on when we wear our selves out in the name of passion.

Whether it’s donkey-like frustration or the thrill of exploration causing me to be fixated on details of marginal importance, going too far off base eventually leaves me feeling muddled, lost, and unanchored from a sense of purpose.

I don’t have a perfect recipe for managing emotions, but consciously making an effort to improve continuously seems important. Basic Sesame Street interrogative techniques like “how am I feeling today? where is that feeling in my body?” are useful, and the field of interoception informed by modern data-driven neuro research is generating lots of practical insights.

Fortunately for the rest of us, most data-driven researchers don’t have to bootstrap and administer their own servers, but my database isn’t going to install itself.

Installing postgres

A note on security: this guide is not concerned with securing a database beyond basic password protection. I am working on a home network and not exposing my database directly to the open internet.

My goal here is to learn how to run SQL queries, not to take a deep dive into how security standards and obscure configuration files interact with each other. When connecting programs with each other, tighter security settings cause cryptic errors and resolving them drains precious time and causes frustration.

There is always some risk, particularly from any compromised device on your network, but this is not a system we plan to keep around for a long time or deploy in critical or sensitive roles, so the risk is tolerable. Don’t do this over public wifi or on any untrusted network.

This guide is largely built from this well-written one. The primary difference is that I am explicitly focusing on a diskless install and using lbu to commit changes to the apk overlay so that they persist between boots.

Update system and add nano for easier config file editing, unless you like to use vi:
apk update
apk upgrade
apk add nano
Create postgres user and give them admin privileges:
adduser -h /var/lib/postgresql/ postgres
EDITOR=nano visudo
In the config file, add postgres to the sudoers section underneath root:
```
 ## User privilege specification  
 ##  
 root ALL=(ALL) ALL  
 postgres ALL=(ALL) ALL
```
Switch to the postgres account, install postgres, and create the connection socket folder:
su - postgres
apk add postgresql
mkdir /run/postgresql/
chown postgres:postgres /run/postgresql/
Add the sockets folder to lbu to persist it, because Alpine only persists /etc between reboots by default:
lbu add /run/postgresql/
Do the same for the data directory where the database files will be located and only let user postgres modify the directory:
mkdir /var/lib/postgresql/data/
chown postgres:postgres /var/lib/postgresql/data/
touch var/lib/postgresql/.psql_history
chown postgres:postgres /var/lib/postgresql/data/.psql_history
chmod 0700 /var/lib/postgresql/data/
lbu add /var/lib/postgresql/data/
lbu add /var/lib/postgresql/data/.psql_history
IMPORTANT: Run the lbu commit -d command to commit the changes we made to the Pi’s SD card. The -d flag deletes old overlay files so that they don’t hang around and junk up our SD card. It’s good practice to get in the habit of running lbu commit -d regularly so that we don’t lose work.
We are running Alpine from RAM (aka diskless), which means changes are not automatically saved to the SD card the way they are in most operating systems. This includes the information in our database. I should probably automate periodic commits with a shell script at some point.
Initialize the database with initdb -D /var/lib/postgresql/data. Some of the time zones options in the Alpine setup-timezone script cause errors with postgres:
```
 running bootstrap script ... 2021-05-28 15:42:07.769 GMT [3553] LOG:  invalid value for parameter "log_timezone": "US//Eastern"  
 2021-05-28 15:42:07.770 GMT [3553] LOG:  invalid value for parameter "TimeZone": "US//Eastern"
```
In my case, US//Eastern was not recognized, so I ran setup-timezone again and entered EST5EDT (Eastern Standard Time -5 GMT Daylight Savings Time) and the command to initialize ran successfully.
To enable remote connections, which we plan to do from RStudio and pgAdmin4, type echo "host all all 0.0.0.0/0 md5" >> /var/lib/postgresql/data/pg_hba.conf to modify the client config file. Then enter nano /var/lib/postgresql/data/postgresql.conf to edit the postgres config file. Uncomment and modify the following lines as such:
- listen_addresses = '*'
- port = 5432

Start the server with pg_ctl start -D /var/lib/postgresql/data/. This may work without issue, but I got the following error:

 localhost:/var/lib/postgresql$ pg_ctl start -D /var/lib/postgresql/data
 waiting for server to start....2021-05-28 14:18:11.231 EDT [2894] LOG:  starting PostgreSQL 13.3 on aarch64-alpine-linux-musl, compiled by gcc (Alpine 10.2.1_pre1) 10.2.1 20201203, 64-bit
 2021-05-28 14:18:11.231 EDT [2894] LOG:  could not bind IPv4 address "0.0.0.0": Address in use
 2021-05-28 14:18:11.231 EDT [2894] HINT:  Is another postmaster already running on port 5432? If not, wait a few seconds and retry.
 2021-05-28 14:18:11.231 EDT [2894] WARNING:  could not create listen socket for "0.0.0.0"
 2021-05-28 14:18:11.231 EDT [2894] FATAL:  could not create any TCP/IP sockets
 2021-05-28 14:18:11.245 EDT [2894] LOG:  database system is shut down
 stopped waiting
 pg_ctl: could not start server
 Examine the log output.

Port 5432 was being used by an instance of postgres that I started while writing and testing these instructions that didn’t shut down properly, so I ran sudo netstat -tulpn | grep 5432 to identify the process using port 5432 (in this instance process 2660) and then typed kill 2660. Starting the database worked normally after that.

Test the database:
psql Confirm you can log in
\c Confirm that account postgres works
SELECT datname FROM pg_database; Confirm database exists
CREATE DATABASE demo; Test database creation
SELECT datname FROM pg_database; Confirm test database was created
Add an executable script to start postgres when the system boots:
touch /etc/local.d/postgres-custom.start
chmod +x /etc/local.d/postgres-custom.start
nano /etc/local.d/postgres-custom.start
Then we’ll add this script to the file:
```
 #!/bin/sh
 su postgres -c 'pg_ctl start -D /var/lib/postgresql/data'
```
Commit our changes so that they persist, and we’ll reboot the system to test that everything is working correctly.
lbu commit -d Note that commit will take longer the more data it’s writing
reboot
Confirm that the demo database was persisted from the postgres account with the commands psql and SELECT datname FROM pg_database;. If both commands work as expected, then the database is booting and persisting as it should. Many errors can happen that prevent this, however.

Common issues

I haven’t figured out why yet, but some Alpine installs fail to persist the socket directory /run/postgresql/ between reboots. I powered down the Pi and looked at the SD card on my computer and /run/postgresql/ had been properly saved in the apk file, so lbu add and lbu commit -d are doing what they are supposed to do.
Something is happening in the operating system to cause /run/postgresql/ but not /var/lib/postgresql/ to fail to load in RAM. I have had installs where this issue occurs and installs where it does not.
If postgres expects the socket folder to exist at runtime and it does not, then the database will fail to load. When you reboot, if /run/postgresql doesn’t exist, then open /var/lib/postgresql/data/postgresql.conf and remove it from the socket directories setting:
unix_socket_directories = '/tmp'

Starting up the database manually gives the following error:

 localhost:/var/lib/postgresql$ pg_ctl start -D /var/lib/postgresql/data/
 pg_ctl: could not open PID file "/var/lib/postgresql/data/postmaster.pid": Permission denied

This happens because the file owner gets changed from postgres to some numerical value between reboots, preventing postgres from starting the database. Change it back manually with chown postgres:postgres /var/lib/postgresql/ -R. It’s a good idea to also add that command to our startup script located at /etc/local.d/postgres-custom.start so that it now reads as follows:

    #!/bin/sh
    chown postgres:postgres /var/lib/postgresql/ -R
    su postgres -c 'pg_ctl start -D /var/lib/postgresql/data'

Conclusion

Now we have an empty PostgreSQL database running on the Alpine Pi. Managing persistence with a diskless install is a bit of a headache, but some basic error parsing, scripting knowledge, and Google Fu should be enough to troubleshoot the headaches that pop up. Once the database is running it shouldn’t need to reboot very often anyway, so hopefully these issues won’t be common.

Our next steps are to connect our database with pgAdmin4, the postgres-specific GUI application. Other well-regarded programs also exist, particularly DBeaver, but I am going to focus on learning pgAdmin first. After that, we will connect the database to RStudio and begin working with queries and basic data manipulation.