Quantcast
Channel: Apache Ignite 2.14: Getting "partition data has been lost" error for ignite-sys-atomic-cache - Stack Overflow
Viewing all articles
Browse latest Browse all 2

Apache Ignite 2.14: Getting "partition data has been lost" error for ignite-sys-atomic-cache

$
0
0

I have an Apache Ignite 2.14 cluster of 3 nodes running on Kubernetes. All my caches have one backup copy.

After enabling persistence on the default data region a couple of months ago, I started getting the exception CacheInvalidStateException: Failed to execute the cache operation (all partition owners have left the grid, partition data has been lost) when one or two nodes restarted either as a result of deployment or for some other reason.

It was worrying but I learned to fix it by running control.sh --cache reset_lost_partitions cacheName.

This time after two nodes restarted due to some transient failure I started getting an error which I couldn't fix by running the mentioned command:

Caused by: class org.apache.ignite.internal.processors.cache.CacheInvalidStateException: Failed to execute the cache operation (all partition owners have left the grid, partition data has been lost) [cacheName=ignite-sys-atomic-cache@default-ds-group, partition=985, key=UserKeyCacheObjectImpl [part=985, val=GridCacheInternalKeyImpl [name=alias, grpName=default-ds-group], hasValBytes=true]] at rg.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validateKey(GridDhtTopologyFutureAdapter.java:214)

Looks like this time this issue involved a system cache ignite-sys-atomic-cache@default-ds-group. I guess it is related to the AtomicSequence object that I use in the application to get IDs generated. The error occurs exactly when I'm trying to use AtomicLong.

The question are:

  • Why it might happen?
  • Is it possible to fix it without destroying the cluster and reloading all the data from scratch (it would take a day or two).
  • How to prevent similar issues in the future?

Thank you in advance!

P.S. On GridGain Portal the following error is reported: Cache [default-ds-group] has zero partition copies.


Viewing all articles
Browse latest Browse all 2

Latest Images

Trending Articles





Latest Images