[JBoss JIRA] (JGRP-2320) FILE_PING.findMembers() optimizations
by Bela Ban (Jira)
[ https://issues.jboss.org/browse/JGRP-2320?page=com.atlassian.jira.plugin.... ]
Bela Ban updated JGRP-2320:
---------------------------
Fix Version/s: 4.1.3
(was: 4.1.2)
> FILE_PING.findMembers() optimizations
> -------------------------------------
>
> Key: JGRP-2320
> URL: https://issues.jboss.org/browse/JGRP-2320
> Project: JGroups
> Issue Type: Enhancement
> Affects Versions: 3.6.16, 4.0.15
> Reporter: Nick Sawadsky
> Assignee: Bela Ban
> Priority: Minor
> Fix For: 4.1.3
>
>
> Following on from JGRP-2288, a couple of possible optimizations/improvements to {{FILE_PING.findMembers()}} were identified.
> 1. After the initial call to {{readAll()}}, some corrective steps are taken if the local node address was not returned by {{readAll()}}. However, in the case where {{findMembers()}} is invoked by {{TP.fetchResponsesFromDiscoveryProtocol()}}, it is normal if the local node address is not returned, since the {{readAll()}} responses are filtered based on the {{members}} parameter.
> To avoid unnecessary writes to the file or cloud store, it would be good to add some checks based on whether {{members}} is null or not. For example, the calls to {{write()}} and {{writeAll()}} should probably not occur unless {{members}} is null.
> 2. In the call to {{sendDiscoveryResponse()}}, the last parameter is always {{false}}. However, it seems possible for a coordinator to get to this point in some edge cases. Though I haven't been able to identify any clear bugs that this would lead to, it might be better to pass {{is_coord}} as the last parameter.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
6 years, 9 months
[JBoss JIRA] (JGRP-2273) ASYM_ENCRYPT: deprecate encrypt_entire_message
by Bela Ban (Jira)
[ https://issues.jboss.org/browse/JGRP-2273?page=com.atlassian.jira.plugin.... ]
Bela Ban updated JGRP-2273:
---------------------------
Fix Version/s: 4.1.3
(was: 4.1.2)
> ASYM_ENCRYPT: deprecate encrypt_entire_message
> ----------------------------------------------
>
> Key: JGRP-2273
> URL: https://issues.jboss.org/browse/JGRP-2273
> Project: JGroups
> Issue Type: Enhancement
> Reporter: Bela Ban
> Assignee: Bela Ban
> Priority: Major
> Fix For: 4.1.3
>
>
> In {{ASYM_ENCRYPT}}, {{encrypt_entire_message}} encrypts not only the payload, but also metadata such as destination and sender's address, headers and flags.
> The rationale was to prevent replay attacks. However, this is not an issue, as replayed messages will simply get dropped by the retransmission layer (e.g. NAKACK2 or UNICAST3).
> If people still want this feature, they can write a protocol _above_ {{ASYM_ENCRYPT}}, which serializes the entire message into the payload of a new message, and this would be exactly the same as setting {{encrypt_entire_message}} to {{true}}.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
6 years, 9 months
[JBoss JIRA] (JGRP-2362) Providing logical member name in JDBC_PING
by Bela Ban (Jira)
[ https://issues.jboss.org/browse/JGRP-2362?page=com.atlassian.jira.plugin.... ]
Bela Ban updated JGRP-2362:
---------------------------
Fix Version/s: 4.1.3
(was: 4.1.2)
> Providing logical member name in JDBC_PING
> ------------------------------------------
>
> Key: JGRP-2362
> URL: https://issues.jboss.org/browse/JGRP-2362
> Project: JGroups
> Issue Type: Feature Request
> Affects Versions: 4.0.17, 4.0.18, 4.0.19, 4.1.0, 4.0.20
> Reporter: S Pokutniy
> Assignee: Bela Ban
> Priority: Minor
> Fix For: 4.1.3
>
>
> When using JDBC_PING and logical names instead of UUIDs and one of the cluster member crashes or get killed and this member is not coordinator then its database set still remains in the database as long as coordinator changes (independently from remove_old_coords_on_view_change /remove_all_data_on_view_change). If the the cluster is then restarted the old dataset makes connect() much slower (+30 seconds), as the members seem to be tryting to connect to it. Parameter remove_all_data_on_view_change seems to be the solution but it does not work as long as coordinator does not change, so practically the same as remove_old_coords_on_view_change.
> The only solution seems to be to provide an appropriate delete statement in parameter initialize_sql, which would delete old entry, for example like this: delete from JGROUPSPING where ping_data like '%logical name%'. However, this is neither really quick nor the ideal solution, as ping_data's datatype is bytea or bit varying.
> It would be great to have also logical name in JGROUPSPING, which is instead per default in insert(). This is also easy to implement as there is access to this information in PingData.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
6 years, 9 months
[JBoss JIRA] (JGRP-2327) UNICAST3: create receiver table when non-first message is received first
by Bela Ban (Jira)
[ https://issues.jboss.org/browse/JGRP-2327?page=com.atlassian.jira.plugin.... ]
Bela Ban updated JGRP-2327:
---------------------------
Fix Version/s: 4.1.4
(was: 4.1.3)
> UNICAST3: create receiver table when non-first message is received first
> ------------------------------------------------------------------------
>
> Key: JGRP-2327
> URL: https://issues.jboss.org/browse/JGRP-2327
> Project: JGroups
> Issue Type: Enhancement
> Reporter: Bela Ban
> Assignee: Bela Ban
> Priority: Major
> Fix For: 4.1.4
>
>
> * A and B
> * B sends 2 messages to A: B1 and B2
> * A receives B2 _before_ B1 (both B1 and B2 are OOB)
> * The current code has A drop B2 and send a SEND_FIRST_SEQNO message to B
> * B resends B1, but _not_ B2
> * Retransmission needs to kick in before A receives B2. This might take up to {{xmit_interval * 2}} ms before B2 is retransmitted and delivered
> h4. Scenario (JGRP-2293):
> * A installs a new view
> * B sends a LEAVE-REQ to A (B is leaving, too) on the view installation and a VIEW-ACK for the view. Both messages are unicasts to A and OOB
> * The VIEW-ACK (B2) is received first and dropped, so it will have to be retransmitted
> * This delays the view installation, as A waits for {{view_ack_collection_timeout}} ms until it has received all VIEW-ACKs
> h4. Workaround
> * Set GMS.leave_timeout to a higher value (say 8000ms)
> h4. Solution
> * When B2 is received, and it is not the first message and we don't have a receiver table for B yet, investigate whether we can create the receiver table anyway
> * However, this requires the first seqno from B *to always be 0*
> --> Investigate whether the first seqno in UNICAST3 is always 0, then this solution will work
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
6 years, 9 months
[JBoss JIRA] (JGRP-2360) DeadLock while acqiring a distributed lock consecutively by the same thread in a loop
by Bela Ban (Jira)
[ https://issues.jboss.org/browse/JGRP-2360?page=com.atlassian.jira.plugin.... ]
Bela Ban resolved JGRP-2360.
----------------------------
Resolution: Done
Duplicate of JGRP-2364. Can you re-test with JGroups master and let me know if this fixes your issue? Please re-open this issue should te problem persist.
> DeadLock while acqiring a distributed lock consecutively by the same thread in a loop
> -------------------------------------------------------------------------------------
>
> Key: JGRP-2360
> URL: https://issues.jboss.org/browse/JGRP-2360
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 3.6.18, 4.1.1
> Environment: JGroups-4.1.1-Final
> Red Hat 4.4.7-23
> JDK 1.8.0_202
> Reporter: Daniel Klosinski
> Assignee: Bela Ban
> Priority: Major
> Fix For: 4.1.2
>
> Attachments: DLTest.java, DistributedLockRepoducer.zip, log.log
>
>
> Deadlock intermittently happens when trying to acquire a distributed lock by the same VM, consecutively by the same thread in a loop. Here is a code snippet for which this issue can occur :
> {code}
> for(String s : list){
> Lock lock=lock_service.getLock("test_lock_name");
> lock.lock();
> //perform bussines logic
> lock.unlock();
> }
> {code}
> Running such loop I am getting dead look after a few loop iterations. In the attached logs program hanged after 3 iterations
> During the troubleshooting, I found out that lock_id is not being incremented for the new distributed lock. In the first two loop iterations everything was fine. At the third iteration lock_id didn't get increased:
> {code}
> 2019-07-15 16:03:32 TRACE CENTRAL_LOCK:163 - svc-2-sps-34594 --> svc-1-sps-4688: GRANT_LOCK[test_lock_name, lock_id=1, owner=svc-2-sps-34594::1]
> 2019-07-15 16:03:32 TRACE CENTRAL_LOCK:163 - svc-2-sps-34594 <-- svc-1-sps-4688: LOCK_GRANTED[test_lock_name, lock_id=1, owner=svc-2-sps-34594::1, sender=svc-1-sps-4688]
> 2019-07-15 16:03:32 TRACE CENTRAL_LOCK:163 - svc-2-sps-34594 --> svc-1-sps-4688: RELEASE_LOCK[test_lock_name, lock_id=1, owner=svc-2-sps-34594::1]
> 2019-07-15 16:03:32 TRACE CENTRAL_LOCK:163 - svc-2-sps-34594 <-- svc-1-sps-4688: RELEASE_LOCK_OK[test_lock_name, lock_id=1, owner=svc-2-sps-34594::1, sender=svc-1-sps-4688]
> 2019-07-15 16:03:32 TRACE CENTRAL_LOCK:163 - svc-2-sps-34594 --> svc-1-sps-4688: GRANT_LOCK[test_lock_name, lock_id=2, owner=svc-2-sps-34594::1]
> 2019-07-15 16:03:32 TRACE CENTRAL_LOCK:163 - svc-2-sps-34594 <-- svc-1-sps-4688: LOCK_GRANTED[test_lock_name, lock_id=2, owner=svc-2-sps-34594::1, sender=svc-1-sps-4688]
> 2019-07-15 16:03:32 TRACE CENTRAL_LOCK:163 - svc-2-sps-34594 --> svc-1-sps-4688: RELEASE_LOCK[test_lock_name, lock_id=2, owner=svc-2-sps-34594::1]
> 2019-07-15 16:03:32 TRACE CENTRAL_LOCK:163 - svc-2-sps-34594 <-- svc-1-sps-4688: RELEASE_LOCK_OK[test_lock_name, lock_id=2, owner=svc-2-sps-34594::1, sender=svc-1-sps-4688]
> 2019-07-15 16:03:32 TRACE CENTRAL_LOCK:163 - svc-2-sps-34594 --> svc-1-sps-4688: GRANT_LOCK[test_lock_name, lock_id=2, owner=svc-2-sps-34594::1]
> 2019-07-15 16:03:32 TRACE CENTRAL_LOCK:163 - svc-2-sps-34594 <-- svc-1-sps-4688: CREATE_LOCK[test_lock_name, owner=svc-2-sps-34594::1, sender=svc-1-sps-4688]
> 2019-07-15 16:03:32 TRACE CENTRAL_LOCK:163 - svc-2-sps-34594 <-- svc-1-sps-4688: LOCK_GRANTED[test_lock_name, lock_id=2, owner=svc-2-sps-34594::1, sender=svc-1-sps-4688]
> {code}
> I've added few extra loggers into Jgroups-4.1.1.Final code and I realized that the second client lock was not removed from the client lock table before the creation of 3rd client lock. The issue lays in below piece of code. Owner consists of address and threadID. If the same thread, on the same VM, creates distributed lock consecutively and if there is an existing entry in the client lock table for the same owner, the new lock won't be created. The old client lock will be used to acquire a new distributed lock :
> {code}
> protected synchronized ClientLock getLock(String name, Owner owner, boolean create_if_absent) {
> Map<Owner,ClientLock> owners=table.get(name);
> if(owners == null) {
> if(!create_if_absent)
> return null;
> owners=Util.createConcurrentMap(20);
> Map<Owner,ClientLock> existing=table.putIfAbsent(name,owners);
> if(existing != null)
> owners=existing;
> }
> ClientLock lock=owners.get(owner);
> if(lock == null) {
> if(!create_if_absent)
> return null;
> lock=createLock(name, owner);
> owners.put(owner, lock);
> }
> return lock;
> }
> {code}
> I believe that this issue was introduced by the fix for JGRP-2234 and it is caused by the race condition. The logic that deletes client lock from the client lock table is now executed when the client's VM receives RELEASE_LOCK_OK message from the coordinator. Previously this deletion was executed by the thread in which unlock() method was called. Now, it is executed by the separate thread which handles RELEASE_LOCK_OK from the coordinator and this is why we have a race condition here. Here is a sequence which leads to deadlock:
> 1. Create client lock (lock_id=2)
> 2. Send GRANT_LOCK (lock_id=2) to coordinator
> 3. Receive LOCK_GRANTED (lock_id=2) from coordinator
> 4. Send RELEASE_LOCK (lock_id=2) to coordinator
> 5. Call look() method in the same thread (new client lock won't be created as there is an existing entry in the client lock table for this owner)
> 6. Receive RELEASE_LOCK_OK and delete client lock from client lock table.
> 7. Send GRANT_LOCK (lock_id=2) to coordinator
> 8. Receive LOCK_GRANTED (lock_id=2) from coordinator
> 9. No entry in the client lock table. It's not possible to get the thread which needs to be notified.
> I am attaching a simple program which can be used to reproduce and generated logs.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
6 years, 9 months