dxdt.ru: занимательный интернет-журнал ([syndicated profile] dxdt_feed) wrote2025-06-04 05:39 pm

Древность текстов как “блокчейн”

Posted by Александр Венедюхин

Если самый полный известный текст “Илиады” (например) – это манускрипт десятого века (Venetus A), то как определить, что “Илиада” написана не в десятом веке, а сильно раньше? Помимо цитат в других произведениях, – для которых иногда встречаются более древние, чем десятый век, записи, – используется, например, тот факт, что нашли существенно более старые фрагменты папирусов, содержащие кусочки текста “Илиады”. Ну, как минимум, фрагменты на папирусах можно читать, как фрагменты “Илиады”. И эти фрагменты укладываются в текст манускрипта десятого века. Какие-то фрагменты укладываются очень точно. Например, потому что в них достаточно много слов сохранилось. Какие-то укладываются не так точно и однозначно, как хотелось бы. Естественно, подходят не только парпирусы, но и какие-нибудь надписи на прочих предметах, – папирусы тут для примера.

Вообще, не так уж трудно подобрать очень короткие фрагменты из других текстов, которые, при подходящей нарезке, совпадут с “Илиадой”. Пример, который я нашёл достаточно быстро, есть даже на dxdt.ru (с картинками). Ничего удивительного: побуквенно совпадающие фрагменты из нескольких слов, не являющиеся прямой цитатой, всегда можно найти в двух достаточно больших литературных текстах на естественном языке. Во-первых, чтобы исключить подобные совпадения, нужно специально задаться подобной целью и начать генерировать синтетический текст в стиле “редкий редан редактора редукторной редиской… и т.д.”; но, – во-вторых, – даже если и задаться целью, то всё равно ничего не выйдет на сколь-нибудь большом расстоянии: какие-то слова обязательно попадут в типовую конструкцию, потому что всякая цепочка, выстраиваемая с прицелом на уникальность, начнёт рушиться из-за грамматических правил и смысловой составляющей (по условию задачи – текст литературный). Тем не менее, совпадение фрагментов – довольно надёжный инструмент. Нужно только правильно его применять. Надёжность возрастает с ростом количества доступных букв фрагментов.

Вообще, существенную роль в процессе атрибуции тех же папирусов с фрагментами “Илиады” играет уже расположение букв фрагмента. Эти буквы-символы собираются в кусочки слов, а относительное расположение кусочков похоже на расположение кусочков при общепринятом, каноническом способе записи текста “Илиады”. То есть, чтобы сдвинуть текст “Илиады” из десятого века в прошлое искусственно, нужно будет как-то “пересчитать” все эти фрагменты и их совпадения, разложив нужные элементы на папирусах. Получается что-то похожее на блокчейн с хеш-функциями: чтобы внести изменения – нужно “пересчитать” много данных, которые расползлись по папирусам, это, так сказать, “вычислительно сложно”. При чём, понятно, какие-то папирусы ещё не найдены, на момент формирования основного текста. Это всё не отменяет того факта, что каноническая запись “Илиады” является результатом редактирования и там заведомо отброшены какие-то фрагменты и варианты записи. Речь о другом: существенное изменение – потребует переучёта уже оставленных следов: какие-то нужно будет исключить, какие-то – сделать заново и признать “старыми”, где-то там закопав незаметно в древних развалинах.

Но можно ли, тем не менее, для “Илиады”, к примеру, реализовать такое современными методами? Скажем, взять разные автоматические генераторы изображений, насоздавать изображения папирусов и других надписей, сказать, что это всё вот просто в архиве оцифровано и показывать через Интернет. А при помощи 3d-принтеров и прочих хитрых инструментов – сделать много поддельных кусочков папирусов, надписей на каменных табличках и глиняных амфорах. Это можно сделать, но всё требует затрат. Тут речь не про альтернативную хронологию, а про погружение некоторого текста в более древние слои: получается, что для аккуратного погружения нужно “пересчитывать” совпадения элементов, расставлять слова на разных объектах согласованным способом. Понятно, что это всё работает не только для текстов, но и для археологических изысканий вообще. Тексты тут сильнее потому, что они сковывают возможности трактовки фактов через известное свойство истории “быть функцией от современности”.

vak: (Украина)
Serge Vakulenko ([personal profile] vak) wrote2025-06-04 11:22 am
Entry tags:

Документальные кадры

На кадрах зафиксированы удары FPV-дронов СБУ по четырем вражеским аэродромам: «Оленья», «Иваново», «Дягилево» и «Белая», где базировалась стратегическая авиация государства-агрессора, которая регулярно обстреливает мирные украинские города.

Среди пораженных воздушных судов — А-50, Ту-95, Ту-22, Ту-160, а также Ан-12 и Ил-78.

https://youtu.be/y-ksNjIAkJo
phd_ru: (Default)
phd_ru ([personal profile] phd_ru) wrote2025-06-04 08:25 pm
Entry tags:

The usual perils

Задумался о происхождении слова "перила". Предположил, что от "peril". В самом деле, это защита от всяких perils.

Фасмер думает, что от слова "переть". Неожиданно.

X-Post в ЖЖ.
Рыбинская телевизионная служба "Рыбинск-40"Н ([syndicated profile] ryb40_feed) wrote2025-06-04 07:42 pm
Рыбинская телевизионная служба "Рыбинск-40"Н ([syndicated profile] ryb40_feed) wrote2025-06-04 07:39 pm

РЕОРГАНИЗАЦИЯ СТУДЕНЧЕСКИХ ОБЩЕЖИТИЙ

Одно общежитие РГАТУ имени Павла Соловьева будет капитально отремонтировано, а другое в будущем ждёт снос.
Рыбинская телевизионная служба "Рыбинск-40"Н ([syndicated profile] ryb40_feed) wrote2025-06-04 07:36 pm

О РАБОТЕ ОБРАЗОВАТЕЛЬНЫХ КОМПЛЕКСОВ РЕГИОНА

В школах региона отзвенели последние звонки, и началась подготовка к новому учебному году.
Рыбинская телевизионная служба "Рыбинск-40"Н ([syndicated profile] ryb40_feed) wrote2025-06-04 07:33 pm

СДЕЛАЙ ШАГ К ЗДОРОВЬЮ

Проверить своё здоровье в удобное время и без посещения поликлиники.
Рыбинская телевизионная служба "Рыбинск-40"Н ([syndicated profile] ryb40_feed) wrote2025-06-04 07:31 pm

МЕСТО ОБЪЕДИНЕНИЯ АКТИВНОЙ МОЛОДЁЖИ

Парк-отель «Спасское» традиционно стал местом проведения форума актива работающей молодежи «Пример» ПАО «ОДК-Сатурн».
Рыбинская телевизионная служба "Рыбинск-40"Н ([syndicated profile] ryb40_feed) wrote2025-06-04 07:28 pm
Planet PostgreSQL ([syndicated profile] planetposgresql_feed) wrote2025-06-04 03:58 pm

Michael Christofides: Approximate the p99 of a query with pg_stat_statements

Cover photo by Luca Upper

I recently saw a feature request for pg_stat_statements to be able to track percentile performance of queries, for example the p95 (95th percentile) or p99 (99th percentile).

That would be fantastic, but isn’t yet possible. In the meantime, there is a statistically-dodgy-but-practically-useful (my speciality) way to approximate them using the mean and standard deviation columns in pg_stat_statements.

Why bother?

When wondering what our user experience is like across different queries, we can miss issues if we only look at things by the average time taken.

For example, let’s consider a query that takes on average 100ms but 1% of the time it takes over 500ms (its p99), and a second query that takes on average 110ms but with a p99 of 200ms. It is quite possible that the first query is causing more user dissatisfaction, despite being faster on average.

Brief statistics refresher

The standard deviation is a measure of the amount of variation from the mean. Wider distributions of values have larger standard deviations.

pg_stat_statements has mean_exec_time (mean execution time) and mean_plan_time (mean planning time) columns, but no median equivalents. The other columns we’ll be using for our approximation calculation are stddev_exec_time and stddev_plan_time.

In a perfectly normally distributed data set, the p90 is 1.28 standard deviations above the mean, the p95 is 1.65, and the p99 is 2.33.

Our query timings are probably not normally distributed, though. In fact, many will have a longer tail on the slow end, and some will have a multimodal distribution (with clustering due to things like non-evenly distributed data and differing query plans).

Having said that, even though many of our query timings are not normally distributed, queries with a high p99 are very likely to also have a high mean-plus-a-couple-of-standard-deviations, so if we approximate the p99 assuming a normal distribution, the results should be directionally correct.

Just give me the query already

Here’s a simple query to get our top 50 queries by their approximate p99 timings:

select	mean_exec_time::int,
	mean_plan_time::int,
	stddev_exec_time::int,
	stddev_plan_time::int,
	((mean_exec_time + mean_plan_time) + 
	2.33 * (stddev_exec_time + stddev_plan_time))::int 
	as approx_p99, 
	calls,
	query
from	pg_stat_statements
where	calls > 100
order by approx_p99 desc
limit 50;

Here I’ve summed the mean execution and planning times, and added 2.33 times the sum of their standard deviations. If you’d like to approximate a different percentile, simply substitute the multiple.

Since the timings statistics are in milliseconds, I like to round them to integers to make them easier to scan (at the cost of a little precision, of course).

I also like to filter out queries that have not been executed much, since we usually want to focus on our more common queries. Also, standard deviations and percentile metrics naturally make less sense at low volumes.

If you run the above and notice that your planning times are all zero, you may want to look into the pg_stat_statements track_planning setting, which is off by default.

Alternatives

While I think this provides a nice proxy metric, and is likely more useful than ordering by mean time alone, there are some very reasonable alternatives.

For example, it may simply be better to monitor p95 and p99 type metrics from the application or user perspective, for example in an APM (Application Performance Monitoring) or RUM (Real User Monitoring) tool.

On the database-centric side, another option is the pg_stat_monitor extension, which includes a histogram feature that makes approximating percentiles more accurate. The extension is relatively new though (at least compared to pg_stat_statements!) and as such is both less battle-tested and is not yet available in many environments.

Conclusion

In short, we can approximate percentile statistics like p95 and p99 using the mean and stddev columns in pg_stat_statements. For some use cases, this will likely be a more useful thing to order by (or monitor) than the mean time alone.

As discussed, this method does have obvious flaws, not least that the data is unlikely to be normally distributed. As such, it’s wise to clearly label it as an approximation.

Planet PostgreSQL ([syndicated profile] planetposgresql_feed) wrote2025-06-04 03:26 pm

Claire Giordano: Ultimate Guide to POSETTE: An Event for Postgres, 2025 edition

POSETTE: An Event for Postgres 2025 is back for its 4th year—free, virtual, and packed with deep expertise. No travel needed, just your laptop, internet, and curiosity.

This year’s 45 speakers are smart, capable Postgres practitioners—core contributors, performance experts, application developers, Azure engineers, extension maintainers—and their talks are as interesting as they are useful.

The four livestreams (42 talks total) run from June 10-12, 2025. Every talk will be posted to YouTube afterward (un-gated, of course). But if you can join live, I hope you do! On the virtual hallway track on Discord, you’ll be able to chat with POSETTE speakers—as well as other attendees. And yes, there will be swag.

This “ultimate guide” blog post is your shortcut to navigating POSETTE 2025. In this post you’ll get:

“By the numbers” summary for POSETTE 2025

Here’s a quick snapshot of what you need to know about POSETTE:

About POSETTE: An Event for Postgres 2025
3 days June 10-12, 2025
4 livestreams In Americas & EMEA time zones (but of course you can watch from anywhere)
42 talks All free, all virtual
2 keynotes From Bruce Momjian & Charles Feddersen
45 speakers PG contributors, users, application developers, community members, & Azure engineers
17.4% CFP acceptance rate 40 talks selected from 230 submissions
26% Azure-focused talks 11 talks out of 42 feature Azure Database for PostgreSQL
74% general Postgres talks 31 talks are not cloud-specific at all
16 languages Published videos will have captions available in 16 languages, including English, Czech, Dutch, French, German, Italian, Japanese, Korean, Polish, Portuguese, Russian, Spanish, Turkish, Ukrainian, Chinese Simplified, and Chinese Traditional

And to give you a feel for the hi-level categories and detailed “tags”, to help you navigate all 42 of the talks, maybe this diagram will help.

POSETE 2025 by the numbers
Figure 1: Ultimate Guide for POSETTE 2025, with hi-level categories and detailed tags for all 42 talks.

2 Amazing Keynotes

If you’re interested in what Microsoft is building for Postgres these days, then Charles Feddersen’s keynote is a must-watch. And in spite of all the hype about AI, you’re guaranteed to enjoy Bruce Momjian’s keynote about databases in the AI trenches.

18 Postgres Core talks

Performance

Postgres internals

Replication

Community

Fun

12 Postgres Ecosystem talks

Analytics

App dev

Extensions

Patroni

VS Code

10 Azure Database for PostgreSQL talks

AI-related talks

Customer talks

Flexible Server talks

Oracle to Postgres talks

Where to find the POSETTE Schedule

You may be thinking, “I know how to use a website, Claire.” Fair. But hear me out: the POSETTE 2025 Schedule page has 4 tabs—one for each livestream—and it always opens to Livestream 1 by default.

So if you’re looking for talks in Livestreams 2, 3, or 4:

  • Head to the POSETTE Schedule page
  • Click the tab for the livestream you want
  • Voila—talks for that stream
POSETE 2025 schedule page
Figure 2: Screenshot of the POSETTE 2025 Schedule with separate tabs for the 4 livestreams

How to watch & how to participate on Discord

Here’s how to tune in—and how to participate in the conference.

How to watch the livestreams

  • All 4 livestreams will be watchable on the PosetteConf 2025 home page
  • Pro tip: If you’ve left the page open since the last stream, refresh your browser to see the next livestream.

How to join the virtual hallway track

  • Head to the #posetteconf channel on Discord (on the Microsoft Open Source Discord)
  • That’s where speakers and attendees hang out during the livestreams—it’s where you can ask questions, share reactions, and just say hi.

What’s new in POSETTE 2025

If you attended POSETTE last year (or back when it was called Citus Con), you might be wondering, what’s different this year?

In many ways, the POSETTE playbook is the same: useful and delightful Postgres talks in a virtual, accessible format. But here’s what’s new:

  • New website: And, a new domain too: PosetteConf.com
  • Only 2 keynotes: instead of 4 keynotes last year. We’re honored that Bruce Momjian & Charles Feddersen accepted the invitation to be keynote speakers. Each keynote will be repeated twice.
  • 58% speakers new to POSETTE: 26 out of 45 speakers (58%) are brand new to POSETTE
  • New livestream hosts: 3 of the 7 livestream hosts are brand new to hosting POSETTE livestreams: welcome to Adam Wølk, Derk van Veen, & Thomas Munro
  • Same name: The POSETTE: An Event for Postgres name is here to stay—and we still love the name

Big thank you to our 45 amazing speakers

Every great event starts with great talks—and great talks start with great speakers. Want to learn more about the people behind these talks?

POSETE 2025 speakers
Figure 3: Bio pics for all 45 speakers in POSETTE: An Event for Postgres 25, along with our gratitude.

Join us for POSETTE 2025! Mark your calendars

I hope you join us for POSETTE 2025. Consider yourself officially invited. As part of the talk selection team, I’m definitely biased—but I truly believe these speakers and talk are worth your time.

I’ll be hosting Livestream 1 and Livestream 2 and you’ll find me in the #posetteconf Discord chat. I hope to see you there.

And please—tell your Postgres friends, so they don’t miss out!

🗓️ Add the livestreams to your calendar

Watch last year’s talks in advance: And if you want to get ready, check out the POSETTE 2024 playlist on YouTube. Lots of gems in there.

Acknowledgements & Gratitude

I’ve already thanked the amazing speakers above. In addition, thanks go to Daniel Gustafsson, Teresa Giacomini, and My Nguyen for reviewing parts of this post before publication. And of course, big thank you to the POSETTE 2025 organizing team and POSETTE talk selection team—without you, there would be no POSETTE!

Join the virtual hallway track on Discord
Figure 4: Visual invitation to join the virtual hallway track for POSETTE 2025 on the Microsoft Open Source Discord. So you can chat with the speakers & others in the Postgres community.

This article was originally published on citusdata.com.

The Register - Security ([syndicated profile] register_security_feed) wrote2025-06-04 03:05 pm

Fake IT support calls hit 20 orgs, end in stolen Salesforce data and extortion, Google warns

Posted by Jessica Lyons

Victims include hospitality, retail and education sectors

A group of financially motivated cyberscammers who specialize in Scattered-Spider-like fake IT support phone calls managed to trick employees at about 20 organizations into installing a modified version of Salesforce's Data Loader that allows the crims to steal sensitive data.…

Graham Cluley ([syndicated profile] grahamcluley_feed) wrote2025-06-03 02:00 pm

The AI Fix #53: An AI uses blackmail to save itself, and threats make AIs work better

Posted by Graham Cluley

In episode 53 of The AI Fix, our hosts suspect the CEO of Duolingo has been kidnapped by an AI, Sergey Brin says AIs work better if you threaten them with physical violence, Graham wonders how you put a collar on a headless robot dog, Mark asks why kickboxing robots wear head guards, and the CEO of Anthropic says AI could wipe out entry-level jobs.

Graham asks your favourite AI how it feels about being kidnapped, and Mark explains how an AI tried to save itself by blackmailing the engineer responsible for turning it off.

All this and much more is discussed in the latest edition of "The AI Fix" podcast by Graham Cluley and Mark Stockley.
The Register - Security ([syndicated profile] register_security_feed) wrote2025-06-04 01:35 pm

Crims stole 40,000 people's data from our network, admits publisher Lee Enterprises

Posted by Connor Jones

Did somebody say ransomware? Not the newspaper group, not even to deny it

Regional newspaper publisher Lee Enterprises says data belonging to around 40,000 people was stolen during an attack on its network earlier this year.…

"Элементы": новости науки ([syndicated profile] elementy_news_feed) wrote2025-06-04 03:58 pm

Эволюция зловония: как цветы научились пахнуть падалью

Некоторые растения привлекают мух и других насекомых-падальщиков, подражая запаху гниющей органики с помощью летучих серосодержащих соединений. Ученые описали стоящий за этим молекулярный механизм, который возник независимо в нескольких неродственных группах растений. Оказалось, что для этого достаточно всего трех аминокислотных замен в одном древнем ферменте.

Biz & IT – Ars Technica ([syndicated profile] arstech_it_feed) wrote2025-06-04 11:20 am

Two certificate authorities booted from the good graces of Chrome

Posted by Dan Goodin

Google says its Chrome browser will stop trusting certificates from two certificate authorities after “patterns of concerning behavior observed over the past year” diminished trust in their reliability.

The two organizations, Taiwan-based Chunghwa Telecom and Budapest-based Netlock, are among the dozens of certificate authorities trusted by Chrome and most other browsers to provide digital certificates that encrypt traffic and certify the authenticity of sites. With the ability to mint cryptographic credentials that cause address bars to display a padlock, assuring the trustworthiness of a site, these certificate authorities wield significant control over the security of the web.

Inherent risk

“Over the past several months and years, we have observed a pattern of compliance failures, unmet improvement commitments, and the absence of tangible, measurable progress in response to publicly disclosed incident reports,” members of the Chrome security team wrote Tuesday. “When these factors are considered in aggregate and considered against the inherent risk each publicly-trusted CA poses to the internet, continued public trust is no longer justified.”

Read full article

Comments

Schneier on Security ([syndicated profile] bruce_schneier_feed) wrote2025-06-04 11:00 am

The Ramifications of Ukraine’s Drone Attack

Posted by Bruce Schneier

You can read the details of Operation Spiderweb elsewhere. What interests me are the implications for future warfare:

If the Ukrainians could sneak drones so close to major air bases in a police state such as Russia, what is to prevent the Chinese from doing the same with U.S. air bases? Or the Pakistanis with Indian air bases? Or the North Koreans with South Korean air bases? Militaries that thought they had secured their air bases with electrified fences and guard posts will now have to reckon with the threat from the skies posed by cheap, ubiquitous drones that cFan be easily modified for military use. This will necessitate a massive investment in counter-drone systems. Money spent on conventional manned weapons systems increasingly looks to be as wasted as spending on the cavalry in the 1930s.

The Atlantic makes similar points.

There’s a balance between the cost of the thing, and the cost to destroy the thing, and that balance is changing dramatically. This isn’t new, of course. Here’s an article from last year about the cost of drones versus the cost of top-of-the-line fighter jets. If $35K in drones (117 drones times an estimated $300 per drone) can destroy $7B in Russian bombers and other long-range aircraft, why would anyone build more of those planes? And we can have this discussion about ships, or tanks, or pretty much every other military vehicle. And then we can add in drone-coordinating technologies like swarming.

Clearly we need more research on remotely and automatically disabling drones.

Planet PostgreSQL ([syndicated profile] planetposgresql_feed) wrote2025-05-29 12:45 pm

Ahmet Gedemenli: pgstream v0.6.0: Template transformers, observability, and performance improvements

Learn how pgstream v0.6 simplifies complex data transformations with custom templates, enhances observability and improves snapshot performance.
Planet PostgreSQL ([syndicated profile] planetposgresql_feed) wrote2025-05-27 07:31 am

warda bibi: How to Upgrade Major PostgreSQL Versions: A Practical Production Guide

PostgreSQL versions follow a well-defined five-year support lifecycle. Each major release receives bug fixes, security patches, and minor updates for five years from its initial release date. After that point, the version reaches end-of-life (EOL) and no longer receives official updates.

Staying on an EOL version exposes your systems to security risks, potential compatibility issues, and missing performance improvements introduced in later releases. You can always check the current support status of PostgreSQL versions on the official PostgreSQL Versioning Policy page.

Upgrading to the latest version ensures long-term stability, access to new features, and better support. Recently, I worked on upgrading a critical production PostgreSQL environment from version 11 to 15. Version 15 was chosen because the client’s application had only been tested up to that release.  The system supported large batch workloads and live applications, so we had to be meticulous. While this article draws from that specific project, the steps are broadly applicable to anyone planning a major PostgreSQL upgrade, especially when crossing several versions.

This guide outlines a generalized, production-ready approach for performing major version upgrades using the pg_dump/pg_restore method.

Upgrade Methods

PostgreSQL provides two primary upgrade options, each with distinct advantages.

1. In-place upgrade using pg_upgrade
This method is designed for rapid transitions and minimal downtime. It upgrades the system catalog in place and reuses existing data files, making it highly efficient. However, it requires careful compatibility checks, especially around tablespaces, file system layout, and extensions.

2. Logical upgrade using pg_dump and pg_restore
This method involves exporting the database schema and data from the old cluster and importing them into a new one. While it involves longer downtime and more disk I/O, it avoids binary compatibility issues and is well-suited for multi-version jumps and cross-platform migrations.

If you have a downtime window and are upgrading across multiple versions, the dump/restore method is often the simpler and safer path. In our case, we had a one-day downtime window and also needed to migrate to a new server, so using the pg_dump/pg_restore method was the most practical and reliable approach. It gave us full control over the migration process and allowed us to verify the restored data and performance on the new instance before final cutover.

Pre-Upgrade Preparation

A major PostgreSQL version upgrade can be performed either on the same host or by migrating to a different server. In our case, we opted for a two-server setup:

  • Source: PostgreSQL 11 (actively serving the application)
  • Target: PostgreSQL 15 (fresh install on a separate server)

At the time of migration, the application was actively connected to the PostgreSQL 11 instance. The goal of this upgrade was to migrate the database from version 11 to 15 on a new server. The migration was carried out on Red Hat Enterprise Linux 9, though the overall approach can be adapted to other operating systems depending on your environment and tooling.

Stop Application 

Prior to the upgrade, all client connections, batch jobs, and scheduled processes must be stopped. This guarantees a consistent state and prevents any post-dump changes from being lost. Application access to the source database should be disabled entirely for the duration of the backup.

Prepare Target Server

If PostgreSQL is not yet installed on the target server, you’ll need to set it up before proceeding with the upgrade. The following instructions demonstrate how to install PostgreSQL 15 on a Red Hat 9 system. You may adjust the version number as needed based on your upgrade target.

First, install the official PostgreSQL repository and disable the system’s default PostgreSQL module:

sudo dnf install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-9-x86_64/pgdg-redhat-repo-latest.noarch.rpm

sudo dnf -qy module disable postgresql

Next, install the PostgreSQL server package for your desired version:

sudo dnf install -y postgresql15-server

Initialize the database cluster and configure the service to start automatically on boot:

sudo /usr/pgsql-15/bin/postgresql-15-setup initdb

sudo systemctl enable postgresql-15

sudo systemctl start postgresql-15

Some applications rely on specific PostgreSQL extensions that must be installed on the target server prior to restoration. During the restore process, you may encounter warnings or errors if these extensions were present in the source database but are missing from the target environment.

In our case, the only extension we were using was pg_stat_statements, which did not impact the restore itself and could safely be added afterward. However, if your application or schema depends on certain extensions (for custom functions, data types, or triggers), it’s important to ensure those extensions are available before the restore begins to avoid failures or broken dependencies.

Take Dump – Using the target version tools

To ensure compatibility during a major version upgrade, it is strongly recommended to use the pg_dump and pg_dumpall binaries from the target PostgreSQL version (in our case, version 15). This helps avoid potential issues that can arise from using outdated dump formats when restoring to a newer server. If the target binaries are not already available on the source (older) server, you can install just the client tools without the server package using the following command:

sudo dnf install -y postgresql15

If installing PostgreSQL 15 tools on the source server is not possible due to system constraints or compatibility issues, you can run the dump commands remotely from the target server (or any server that has PostgreSQL 15 binaries installed), using the -h flag to connect to the source database over the network. In our scenario, we encountered compatibility issues while trying to install PostgreSQL 15 tools on the production server. Instead, we executed both dump commands remotely from the target Red Hat 9 server using PostgreSQL 15 binaries. This approach worked reliably after setting a password for the postgres user to allow authenticated remote access. 

Export the main database using custom format:

/usr/pgsql-15/bin/pg_dump -Fc -h <source-host> -U postgres -d <database> -f /path/to/backup.dump

The custom format is recommended because it allows greater control during restoration such as selective restores and parallelism, etc.  Note that backup time will vary depending on database size and hardware. In our case, backing up an 800 GB database took approximately two hours on moderately provisioned infrastructure.

Next, export global objects such as roles, tablespaces, and ownership metadata separately:

/usr/pgsql-15/bin/pg_dumpall -g -h <source-host> -U postgres > /path/to/globals.sql

Once the backup is complete, copy both files to the target server. Additionally, store a copy of both on a separate host (outside of the source and target environments) to serve as a recovery fallback in case of unexpected failure during the upgrade process.

Upgrade Execution Plan

Once the backup files have been transferred to the target server and validated, proceed with the following steps to complete the database restoration.

Begin by restoring global objects such as roles, tablespaces, and their associated privileges. These were captured using pg_dumpall -g and are essential for preserving access control and ownership:

psql -U postgres -f /path/to/globals.sql

Next, create a fresh, empty database with the same name as the original source database:

createdb -U postgres <database>

With the database shell in place, restore the main database dump using pg_restore. For improved performance, enable parallel restore mode using the -j flag, as this can greatly speed up the process. The number of parallel jobs should be adjusted based on available CPU and I/O capacity on the target system:

nohup pg_restore -U postgres -d <database> -j 4 -v /path/to/backup.dump > restore.log 2>&1 &

Using nohup allows the command to continue running in the background even if the terminal session is closed. The -v flag enables verbose output, and restore.log captures both standard output and error messages for review.

Monitor the restore.log file to track progress and check for any errors during the restoration process. Depending on the database size and server resources, this step can take significant time. In our case, the restore of an 800 GB dump completed in approximately 2.5 hours.

After the restoration is complete, run ANALYZE on the database to refresh PostgreSQL’s planner statistics. This ensures the query planner can make informed decisions based on the current data distribution:

psql -U postgres -d <database> -c "ANALYZE;"

Install Required Extensions 

For extensions provided by PostgreSQL's contrib modules like pg_stat_statements you must first install the appropriate package.

sudo dnf install -y postgresql15-contrib

Next, configure PostgreSQL to preload the extension by modifying the postgresql.conf file:

shared_preload_libraries = 'pg_stat_statements'

After updating the configuration, restart the PostgreSQL service for changes to take effect:

sudo systemctl restart postgresql-15

Finally, enable the extension within the database:

psql -U postgres -d <database> -c "CREATE EXTENSION pg_stat_statements;"

To verify that the extension is successfully installed and active, connect to the database and run:

\dx

This command lists all extensions installed in the current database. You should see pg_stat_statements or any others you’ve enabled in the output.

Validate Schema and Structural Integrity

After restoring the database, it is important to validate that the schema and object structure match the original environment. Start by verifying that the number and types of database objects (tables, indexes, views, etc.) match the expected counts. To do this effectively, ensure that you have captured and stored the corresponding object counts from the original production database (source version) prior to the upgrade.

You can run queries like the following to review object distributions by type:

 SELECT
   n.nspname AS schema_name,
   CASE
       WHEN c.relkind = 'r' THEN 'TABLE'
       WHEN c.relkind = 'i' THEN 'INDEX'
       WHEN c.relkind = 'S' THEN 'SEQUENCE'
       WHEN c.relkind = 't' THEN 'TOAST TABLE'
       WHEN c.relkind = 'v' THEN 'VIEW'
       WHEN c.relkind = 'm' THEN 'MATERIALIZED VIEW'
       WHEN c.relkind = 'c' THEN 'COMPOSITE TYPE'
       WHEN c.relkind = 'f' THEN 'FOREIGN TABLE'
       WHEN c.relkind = 'p' THEN 'PARTITIONED TABLE'
       WHEN c.relkind = 'I' THEN 'PARTITIONED INDEX'
       ELSE 'OTHER'
   END AS object_type,
    COUNT(*) AS count

FROM
   pg_class c
JOIN
   pg_namespace n ON c.relnamespace = n.oid
WHERE
   n.nspname IN ('public') 
GROUP BY
   n.nspname, object_type
ORDER BY
    n.nspname, object_type;

This query aggregates object counts grouped by schema and object type using pg_class and pg_namespace. By default, the WHERE clause filters for the public schema. You can either replace ‘public’ with a specific schema name you want to inspect or remove the WHERE clause entirely to include all schemas.

You may also run count checks on critical tables and compare key constraint definitions. Be aware that some catalog-level differences between PostgreSQL versions may lead to minor, expected variations in metadata.

Application Cutover

Once the database has been restored, validated, and tested, the final step is to point the application to the new PostgreSQL server.

Update the application’s connection strings or service configurations to reference the new database host, port, and credentials. On the PostgreSQL side, update the pg_hba.conf file to allow connections from the application hosts, ensuring that appropriate authentication methods are used. Also verify the listen_addresses and port settings in postgresql.conf to confirm that the database is accessible from external systems

Rollback Strategy

Having a rollback plan is essential for any major database upgrade. The rollback approach will differ depending on whether the issue occurs during the upgrade window or after the application has gone live on the new system.

If Issues Occur During the Upgrade Window

Plan A: Redirect the application back to the original PostgreSQL production server. Since no writes would have taken place on the new server at this stage, this provides a clean and immediate fallback with minimal risk.

Plan B: If the original production server is inaccessible or compromised, restore the most recent logical backup (ideally stored on a separate, secure host) to a recovery server. This ensures that a known, consistent version of the database remains accessible, even in the event of infrastructure failure.

If Issues Occur After Go-Live (Writes Have Occurred)

Plan A: Resolve the issue directly on the new PostgreSQL instance. This is the preferred approach, as it preserves any new data written since the cutover and avoids complex recovery operations.

Plan B: Revert to the old PostgreSQL server. This is a last-resort option and involves identifying and manually transferring any data that was created or modified on the new server back to the old environment. This process is time-consuming and introduces risk, and should only be considered when all other remediation efforts have failed.

Mitigation Strategy

To reduce the risk of data inconsistency and simplify rollback procedures, it is advisable to initially run the new PostgreSQL instance in read-only mode after the upgrade. This allows for application-level validation in a production-like environment without making irreversible changes. Once the application has been fully tested and confirmed stable, read/write access can be enabled, completing the transition.

​​Summary

Upgrading between major PostgreSQL versions requires careful planning. Always test your upgrade process in a staging environment before performing it on production. In our upgrade from PostgreSQL 11 to 15, we prioritized safety and transparency. This approach allowed us to validate the schema, minimize risk, and transition to a supported version with confidence.

Choose the upgrade method that best aligns with your environment, downtime tolerance, and operational requirements.

The post How to Upgrade Major PostgreSQL Versions: A Practical Production Guide appeared first on Stormatics.