This is a post to describe the setup of a Raspberry Pi running Alpine Linux. I will install PostgreSQL later to use the system as a practical way to run “real” SQL queries on a live database and build the practical database manipulation and data wrangling skills that are critical for a good analyst.
Skill in data manipulation, analysis, and visualization are also the foundation of more complex work including statistical analysis, data engineering, geographic information systems (GIS), and data science.
The combination of simple projects and regular posts will also act as a professional communication exercise and work portfolio demonstrating domain proficiency, writing ability, and thought processes.
Why this project
For a few years now, I have been following academic research across a range of environmental fields, and I am familiar with many examples of how modern data tools and techniques are influencing other research areas.
This nexus of data, environmental work, and research commercialization fascinates me and so at the beginning of this year I decided that I want to specialize in data work related to the environment.
While the work of researchers is important and related to the larger work I want to do, my experience in higher education administration convinced me that I do not want to be a researcher myself.
I believe that supporting this work as a data professional, either as a freelancer, consultant, teacher, or in a relevant corporate setting, will be more enjoyable and sustainable for me and likely more impactful.
Since the areas that I want to focus on in the near future (data analysis, data visualization, statistical applications, and machine learning) are much more applied in nature, portfolios are a common way to demonstrate competence, communication ability, and showcase work projects.
I have started exploring open datasets available from various sources and believe that these are a good starting point for building projects. I will approach these datasets with several questions in mind:
- How can I use this to demonstrate technical ability?
- What questions can this dataset answer, and why are they important?
- What stories are in this data and how can I tell them best?
- How can I create something out of this that is beautiful, compelling, and entertaining?
In undertaking work like this, it is important to find a balance between big picture strategic concerns and the very nitty gritty details. It is also important to strike a balance between exploration and focus: by habit I explore more than I focus, and so creating this blog is also an effort to push myself to focus on concrete products.
Into the nitty gritty
I will write more about how I am “project managing” myself and my efforts, but I have been focusing first on understanding the larger landscape of contemporary data work. Tool-wise, I have been studying the R programming language for analysis and discovering the ecosystem that exists around it, including the blogdown
package I am using to create this blog.
Competence in using R is important, but the training exercises containing well-structured and well-behaved data are not representative of the real world. Most data worth studying has to be dragged into the analysis kicking and screaming.
To that end, I have started brushing up on my limited SQL skills and will need a “real” database to run queries against. Eventually, I may be able to use it for web crawling, ingesting data, and other more intensive work, but for now I will consider the project successful if I can connect to the database, import existing CSV-formatted datasets into tables, and successfully run queries against them.
Hardware and software choices
In order to run learning projects involving SQL, I am going to set up a basic home server on a Raspberry Pi. The Pi has lots of quirks and drawbacks, but it is cheap, I have it on hand, and it is powerful enough for the tasks at hand.
In addition to studying SQL, I am also interested in (eventually) learning more about containerization and cloud deployment using technologies like Docker, so I will use Alpine Linux, a distribution designed to be small and secure which also runs from RAM, making it faster. Plus, I like trying new distros.
There are many well-regarded enterprise grade database programs and I do not have enough experience with database administration in a business context to have a strong preference for any of them. I am picking PostgreSQL for the database for the following reasons, in no particular order:
- It is widespread and proven technology
- It is widely taught, making it easier to find courses, guides, and forum posts for specific issues
- It is free and open source software (FOSS)
- Its SQL dialect is fairly standard
- It has good documentation
- It has grown in popularity in recent developer surveys and job posting estimates
Install Alpine to SD
(If you are not interested in reading highly specific technical instructions, feel free to stop reading here.)
The base system of the database will be Alpine Linux running on a Raspberry Pi. Since I will be connecting remotely, there will be no need for a desktop environment and I will run the server headless (no monitor or keyboard).
I used this guide for preparation of the SD card and this headless installation guide from the Alpine wiki.
- Get the tarball from Alpine. Since we are using a more recent Raspberry Pi 4, download the
aarch64
build (alpine-rpi-3.13.5-aarch64.tar.gz
as of writing). - Open the GNOME Disks utility (package
gnome-disks
, butdd
can also be used), click the triple dot icon, and select ‘Format Disk.’ - In the ‘Erase’ dropdown, select either Quick or Slow, it makes no difference for our purposes here. For the ‘Partitioning’ dropdown, select the ‘GPT’ option.
- Click the + icon to create a partition in unallocated space. I am using a 32gb SD card and allocated all available space to the partition. The volume label can be anything,
ALPINE
is fine. The type of partition is ‘FAT’. Once the new partition is created, it may be automatically mounted. You should see a ⏵ icon below the Volumes diagram. If you see a square icon, click it to unmount the SD card. - Keep Disks open and switch to an archiving utility to extract the tarball onto the SD card’s root directory. I used another GNOME utility, Archive Manager (
file-roller
), to do this. Be careful not to extract the.
directory itself to the SD card, instead extract the contents of the directory. - While the drive is still unmounted, open a terminal and run the command
lsblk
to list your devices. SD cards will usually besdX
where is a number, in my case the it ismmcblk0
and the partition ismmcblk0p1
. - Run the command
fatlabel /dev/mmcblk0p1 ALPINE
to change the volume name, changingmmcblk0p1
for whatever value your partition is assigned. This is necessary because of a current bug in the RPi firmware. The solution came from this issue. If you do not change the volume name or if you do not unmount the volume before runningfatlabel
, then your Pi will not boot and the red indicator light will flash solid. - Back in Disks, click the ⏵ icon to mount the drive. We are done with the Disks program now.
At this point, you could hook the Pi up to a keyboard, monitor, and ethernet jack and it should boot properly. To run the server headless without additional setup from the local command line, we need to do some extra work.
Create headless overlay
- On your local computer (not on the SD card), type
mkdir etc
from the terminal to create a directory. - Inside this directory, type
touch .default_boot_services
and thenmkdir local.d runlevels
to create the next level of files. Typels -a
to confirm that all 3 files were created. - Type
cd runlevels
to navigate to that directory, and typemkdir default
to create a new directory. Typecd default
to navigate there. In this directory, create a symlink by typingln -s /etc/init.d/local local
. - Move to
etc/local.d
by typingcd ../../local.d
and thentouch headless.start
to create a script file. - Type
nano headless.start
to open the file, copy the following script, and typeCtrl+o
andEnter
to save, thenCtrl+x
to exit. Typecd ..
to return toetc
.
#!/bin/sh
__create_eni()
{
cat <<-EOF > /etc/network/interfaces
auto lo
iface lo inet loopback
auto ${iface}
iface ${iface} inet dhcp
hostname localhost
EOF
}
__create_eww()
{
cat <<-EOF > /etc/wpa_supplicant/wpa_supplicant.conf
network={
ssid="${ssid}"
psk="${psk}"
}
EOF
}
__edit_ess()
{
cat <<-EOF >> /etc/ssh/sshd_config
PermitEmptyPasswords yes
PermitRootLogin yes
EOF
}
__find_wint()
{
for dev in /sys/class/net/*
do
if [ -e "${dev}"/wireless -o -e "${dev}"/phy80211 ]
then
echo "${dev##*/}"
fi
done
}
ovlpath=$(find /media -name *.apkovl.tar.gz -exec dirname {} \;)
read ssid psk < "${ovlpath}/wifi.txt"
if [ ${ssid} ]
then
iface=$(__find_wint)
apk add wpa_supplicant
__create_eww
rc-service wpa_supplicant start
else
iface="eth0"
fi
__create_eni
rc-service networking start
/sbin/setup-sshd -c openssh
cp /etc/ssh/sshd_config /etc/ssh/sshd_config.orig
__edit_ess
rc-service sshd restart
mv /etc/ssh/sshd_config.orig /etc/ssh/sshd_config
Now your directory tree is complete, and typing tree -a
should give the following output:
penguin@localhost:~/etc$ tree -a
.
├── .default_boot_services
├── local.d
│ └── headless.start
└── runlevels
└── default
└── local -> /etc/init.d/local
- From the working directory (where we created
etc
, but not insideetc
), create a zip by runningtar czvf headless.apkovl.tar.gz etc/
and then copy the resulting file to the root directory of the SD card.apkovl
files are overlay files used by Alpine to persist data between reboots, you can read more about how they work here. - Even though I am not using wifi, the above script will fail if it does not find a file called
wifi.txt
in the root directory, so create an empty file withtouch wifi.txt
and copy it to the SD card’s root directory.
Now we are ready to log into the system.
Logging in and basic setup
- Unmount the SD card and eject it, insert it into the Pi. Connect the etnernet and power on the device.
- After a few minutes, check your router for new DHCP leases. The process for doing this varies by vendor but the number will generally be have the format
192.168.###.###
. If you are connected to a VPN, disconnect and attempt to log in by typingssh root@192.168.###.###
. - You will be asked to accept a fingerprint, type
yes
and you should then be logged in. - Instead of running the standard
setup-alpine
collection of scripts to set up the system, we will avoidsetup-sshd
andsetup-interfaces
by manually running the other setup scripts:- setup-ntp
- setup-keymap
- setup-hostname
- setup-timezone
- setup-apkrepos
- setup-lbu
- setup-apkcache
- In my case, the
chronyd
timekeeping service does not start automatically after runningsetup-ntp
, which can be confirmed by typingdate
. As a result of the time being wrong, I get an error when I runsetup-apkrepos
. I correct this by runningrc-service chronyd restart
.date
now shows the correct time andsetup-apkrepos
runs correctly. - We will commit changes by typing
lbu commit -d
so that they persist between reboots, and then typereboot
to restart the system and confirm that our changes persisted.
At this point, we have a basic installation running. Alpine does very little by default, so many things that we will want to do are not set up. The Post-Install section of the Installation page on the Alpine wiki is a good place to start. In the next post, we will set up PostgreSQL.