Zhanga Redux

The chronicles of the work and personal life of a boring software developer with an awesome dog.

Installing Ruby 2.7 + Passenger with Apache on Centos 7

Wednesday, April 5, 2023


CentOS 7 comes with Ruby 2.0, which was released in 2013 and has been deprecated since 2016. (Even CentOS/AlmaLinux 8’s core repos only have Ruby 2.5, which is also outdated, but at least they have dnf streams which make changing the version much easier.) Here is a quick guide to installing Ruby 2.7 on CentOS 7 and getting it working with Apache + Passenger.

First, install/enable the SCL, whose repo definition is provided by the default extras repo, then install Ruby from it:

# yum install centos-release-scl
# yum install rh-ruby27

At this point, Ruby 2.7 is installed, and if you want to run it locally (e.g. to run bundle etc), you can do so by enabling the SCL and spawning a shell:

$ scl enable rh-ruby27 bash # or other preferred shell
$ bundle install ... # or whatever other Ruby commands

Next, install Passenger by following the official guide. If you already have EPEL enabled, then it is likely you only need to install the Passenger repo and then install the Apache module:

# curl --fail -sSLo /etc/yum.repos.d/passenger.repo \
    https://oss-binaries.phusionpassenger.com/yum/definitions/el-passenger.repo
# yum install mod_passenger

At this point you will have two versions of Ruby installed, and Passenger will be using the wrong one (v2.0) by default. To get it to point to Ruby 2.7, edit /etc/sysconfig/httpd and add this line so that Apache knows about the right library path:

LD_LIBRARY_PATH=/opt/rh/rh-ruby27/root/usr/local/lib64:/opt/rh/rh-ruby27/root/usr/lib64

Then edit /etc/httpd/conf.d/passenger.conf so that it can see this environment variable + points to the right Ruby binary:

PassengerRuby /opt/rh/rh-ruby27/root/usr/bin/ruby
PassEnv LD_LIBRARY_PATH

Restart Apache, and that’s all!

Tags: ruby,apache,centos,linux | Posted at 01:55 | Comments (1)

Backblaze B2 Data Corruption Bug

Friday, February 26, 2021


Over many years of faithfully trusting my offsite backups (almost 10 TB) to Backblaze B2, questioning its data integrity never really crossed my mind. It was simply inconceivable that such a widely-used data storage service could possibly return corrupted data. After all, correctly storing and returning data is basically the entire premise of B2, and yet, here we are…

This is the story of how, for an unknown (to me) amount of time, Backblaze B2 returned corrupted data for a small fraction of my files, preventing a successful restore. Luckily, I did not find out this hard fact during disaster recovery.

TL;DR: Backblaze B2 had a data corruption bug that would cause it to return corrupted data for an unknown amount of time, but at least 23 days according to publicly known reports.

Duplicacy

I use Duplicacy to back up my data to Backblaze B2. Duplicacy is an officially recommend B2 client; see the B2 Integrations page. A quick summary of the relevant bits: Duplicacy performs incremental, deduplicated, and encrypted backups by producing chunks from the original files and storing those chunks into B2. Chunks are represented in B2 as files of around 1-10MB each.

In the past, I had periodically run duplicacy check -a to verify that every chunk exists, but this does not verify their contents. Downloading data from B2 is relatively costly ($100 for the 10 TB), and I figured that if my data made it to B2, it was probably fine…

A few days ago, I decided it was time to perform a full verification of my backup data just to be safe. After downloading a whole lot of data and verifying their bits (and racking up a B2 bill in the process), Duplicacy reported a bunch of failures like this:

$ duplicacy check -r 265 -chunks -threads 32 -persist
...
Failed to decrypt the chunk aec070645fe53ee3b3763059376134f058cc337247c978add178b6ccdfb0019f: cipher: message authentication failed; retrying
Failed to decrypt the chunk aec070645fe53ee3b3763059376134f058cc337247c978add178b6ccdfb0019f: cipher: message authentication failed
...
42 out of 1405935 chunks are corrupted

Duplicacy writes a MAC into each chunk, and the error message tells us that the MAC doesn’t match the chunk’s contents. Somewhere along the pipeline, some bits have changed.

I then noticed a very odd behavior: if I attempted to verify just those 42 corrupted chunks, a few more would successfully verify, and if I ran it again a few more would verify, and so on, until after about 20 tries, 38 of the 42 corrupt chunks eventually returned as verified. Duplicacy tries each chunk 4 times before giving up, so 4 chunks in my backup set still had yet to be successfully verified even after trying to download them 80 times.

I made a post on the Duplicacy forum as I was certain I must have been doing something wrong, or perhaps Duplicacy was buggy. Gilbert, the developer (who is very helpful, by the way), suggested it might be memory corruption and asked that I try again without -threads 32 and to try a different machine. I did so… with the same results.

Backblaze B2

Here’s where it gets interesting with respect to B2. Before proceeding, here’s a little background on Duplicacy and B2:

  • B2’s b2_upload_file API accepts a client-computed SHA1 hash of the file, and it will reject uploads received with a mismatched hash. This should prevent corrupted data from being written to B2 in the first place, unless of course the corruption happens before the client computes the SHA1.
  • Duplicacy always provides this SHA1 on uploads.
  • Upon download, aside from the file contents, B2 also returns the SHA1 that was originally supplied.
  • The official B2 CLI tool checks the hash after download and throws an error if verification fails
  • Duplicacy does not check the B2 hash on download, but instead checks its own MAC inside the file, which in this case serves the same function: rejection of corrupted downloads.

I took the list of 38 corrupted-but-later-verified chunks and used the B2 CLI to try to download each chunk one by one. Most of them failed to download and the CLI tool reported sha1 checksum mismatch — which should never happen, as this indicates data corruption somewhere between B2’s successful acceptance of the upload and the post-download hash verification on the client — but 3/38 downloaded successfully.

Even more alarmingly, I even managed to find a chunk that flip-flopped between sha1 checksum mismatch and successful download within seconds of each other:

[user@host duplicacy]$ b2 download-file-by-name [bucket-name] chunks/b5bb9d8014a0f9b1d61e21e796d78dccdf1352f23cd32812f4850b878ae4944c 1
1: 100%|███████████████████████████████████| 3.27M/3.27M [00:00<00:00, 44.9MB/s]
ConsoleTool command error
Traceback (most recent call last):
  File "b2/console_tool.py", line 1521, in run_command
  File "b2/console_tool.py", line 690, in run
  File "logfury/v0_1/trace_call.py", line 84, in wrapper
  File "b2sdk/bucket.py", line 170, in download_file_by_name
  File "logfury/v0_1/trace_call.py", line 84, in wrapper
  File "b2sdk/transfer/inbound/download_manager.py", line 122, in download_file_from_url
  File "b2sdk/transfer/inbound/download_manager.py", line 134, in _validate_download
b2sdk.exception.ChecksumMismatch: sha1 checksum mismatch -- bad data
ERROR: sha1 checksum mismatch -- bad data
[user@host duplicacy]$ b2 download-file-by-name [bucket-name] chunks/b5bb9d8014a0f9b1d61e21e796d78dccdf1352f23cd32812f4850b878ae4944c 2
2: 100%|███████████████████████████████████| 3.27M/3.27M [00:00<00:00, 49.9MB/s]
ConsoleTool command error
Traceback (most recent call last):
  File "b2/console_tool.py", line 1521, in run_command
  File "b2/console_tool.py", line 690, in run
  File "logfury/v0_1/trace_call.py", line 84, in wrapper
  File "b2sdk/bucket.py", line 170, in download_file_by_name
  File "logfury/v0_1/trace_call.py", line 84, in wrapper
  File "b2sdk/transfer/inbound/download_manager.py", line 122, in download_file_from_url
  File "b2sdk/transfer/inbound/download_manager.py", line 134, in _validate_download
b2sdk.exception.ChecksumMismatch: sha1 checksum mismatch -- bad data
ERROR: sha1 checksum mismatch -- bad data
[user@host duplicacy]$ b2 download-file-by-name [bucket-name] chunks/b5bb9d8014a0f9b1d61e21e796d78dccdf1352f23cd32812f4850b878ae4944c 3
3: 100%|███████████████████████████████████| 3.27M/3.27M [00:00<00:00, 47.1MB/s]
File name:    chunks/b5bb9d8014a0f9b1d61e21e796d78dccdf1352f23cd32812f4850b878ae4944c
File id:      [file-id]
File size:    3267110
Content type: application/octet-stream
Content sha1: f1d2d2f924e986ac86fdf7b36c94bcdf32beec15
checksum matches

A few other interesting characteristics of these broken files were:

  • For affected chunks, a successful download did not imply further successful downloads. In other words, downloading the same chunk repeatedly may succeed once but then fail repeatedly after that.
  • Failures seemed correlated in time; trying to download failed chunks repeatedly in a loop did not seem to help (I tried 4 arbitrarily-chosen chunks 100 times each and got 400 SHA1 mismatches), but waiting hours or days seemed to improve the likelihood of possibly getting a successful download.

Just to make sure it’s not something having to do with my local setup, I tried the B2 CLI client on not just my regular backup machine (one with ECC RAM and RAID 6), but also two other physical machines, including one with a different OS and another tethered to my cell data connection instead of using a my landline Internet, but all with the same results.

At this point, it seemed like a bug on Backblaze’s end must be the only explanation, so I submitted a support ticket.

Backblaze’s Response

Backblaze confirmed to me that this issue was affecting multiple customers. Gilbert (the Duplicacy developer) pointed me to issue #3268 on Restic (another backup client, similar to Duplicacy) where users encountered the same data corruption, and some more detail is provided in that thread.

Essentially what is happening is that files in B2 are sharded into 20 pieces, any 17 of which can be used to reconstruct the file, i.e. up to 3 shards can fail before data loss occurs. Backblaze did not verify checksums on reading for download, but rather via an async job that would scan for corruption, and a bad batch of hard drives caused this job to run more slowly than usual, lengthening the amount of time corrupted data was being served up.

The good news is that it seems very unlikely any data was permanently lost; however it’s quite surprising to me that Backblaze has gone so many years without verifying reads and thus knowingly returning corrupted data on a small percentage of downloads.

On March 1, 2021, a fix was applied which added checksum verification on download. All of my files seem to be readable now, so hopefully this particular issue is fixed for everyone.

Tags: backblaze | Posted at 22:08 | Comments (0)

Windows fails to format USB drive that previously contained an ISO image

Thursday, February 11, 2021


I recently tried to format a USB flash drive for use in Windows, only to find that none of the usual tools (Explorer, Disk Management) would work. Even repartitioning the drive in Linux using fdisk didn't help.

Symptoms include:

Explorer doesn't show a drive letter when the drive is inserted, even if formatted FAT or NTFS.

Attempting to create a partition using Disk Management (New Simple Volume) may fail with:

The operation is not supported on a non-empty removable disk.

Or it may succeed, but the associated format step may fail and a dialog will pop up:

The volume was created successfully but it was not formatted. Retry the format using the format task.

Attempting to format the partition using Disk Management produces this error dialog:

The system cannot find the file specified.

diskpart's CLEAN fails with an error similar to:

DiskPart has encountered an error: The system cannot find the file specified. See the System Event Log for more information.

The log in Event Viewer/Windows Logs/System has Source "VDS Basic Provider" and message:

Cannot zero sectors on disk \\?\PhysicalDrive2. Error code: 5@0101000F

(All messages above taken from Windows 10.)

To fix this, plug in the drive and run diskpart. Select the appropriate disk:

LIST DISK
SELECT DISK #

Then try CONVERT:

CONVERT MBR

If that doesn't work, try CONVERT GPT or CLEAN. Good luck!

Tags: windows, filesystem | Posted at 10:43 | Comments (0)

Encrypting and decrypting PDFs using QPDF

Saturday, March 14, 2020


If you have a password-protected PDF and want to generate a decrypted version, it’s very simple with QPDF:

$ qpdf --password=123 --decrypt input.pdf output.pdf

To avoid entering the password on the command line (perhaps you don’t want it saved in your shell history):

$ qpdf @- --decrypt input.pdf output.pdf

Then enter --password=123 and hit enter and ^D.

To encrypt a PDF, do this:

$ qpdf --encrypt userPw ownerPw 256 -- input.pdf output.pdf

Similarly, to avoid including the passwords in the command, do this:

$ qpdf --encrypt @- @- 256 -- input.pdf output.pdf

And then type the user password, hit enter, the owner password, enter, and ^D.

Tags: pdf | Posted at 22:04 | Comments (12)

Disabling kdump to reclaim missing RAM on CentOS 8

Saturday, January 25, 2020


After setting up a Droplet on DigitalOcean (a VPS) using their CentOS 8 image, I found that various sources (like top, free, and even /proc/meminfo) were reporting only 821 MB of total RAM, even though the instance should have had 1 GB. Where did the missing ~200 MB go?

It turns out that kdump is enabled by default. In short, it uses a second kernel to capture dumps in case the running kernel crashes. I don't need this since I'm not going to do anything useful with those dumps anyway, so here's how to disable it and get the memory back.

First, check to see if it's enabled by looking for a nonzero value in /sys/kernel/kexec_crash_size:

$ cat /sys/kernel/kexec_crash_size

You'll also see a line like this in dmesg:

kernel: Reserving 160MB of memory at 672MB for crashkernel (System RAM: 1023MB)

To disable it, edit /etc/default/grub and change crashkernel=auto to crashkernel=no, then:

# grub2-mkconfig -o /boot/grub2/grub.cfg
# systemctl disable kdump
# reboot

On EL9 (AlmaLinux 9, Rocky Linux 9, etc) with BLS, try grubby instead:

# grubby --info=DEFAULT
# grubby --update-kernel ALL --args 'crashkernel=no'
# reboot

Now you should have your memory back!

Tags: centos, linux | Posted at 18:16 | Comments (1)